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ABSTRACT 

Ontology-based data access is concerned with querying in- 
complete data sources in the presence of domain-specific 
knowledge provided by an ontology. A central notion in 
this setting is that of an ontology-mediated query, which 
is a database query coupled with an ontology. In this pa- 
per, we study several classes of ontology-mediated queries, 
where the database queries are given as some form of con- 
junctive query and the ontologies are formulated in de- 
scription logics or other relevant fragments of first-order 
logic, such as the guarded fragment and the unary-negation 
fragment. The contributions of the paper are three-fold. 
First, we characterize the expressive power of ontology- 
mediated queries in terms of fragments of disjunctive dat- 
alog. Second, we establish intimate connections between 
ontology-mediated queries and constraint satisfaction prob- 
lems (CSPs) and their logical generalization, MMSNP for- 
mulas. Third, we exploit these connections to obtain new 
results regarding (i) first-order rewritability and datalog- 
rewritability of ontology-mediated queries, (ii) P/NP di- 
chotomies for ontology-mediated queries, and (iii) the query 
containment problem for ontology-mediated queries. 

1. Introduction 

Ontologies are logical theories that formalize domain- 
specific knowledge, thereby making it available for machine 
processing. Recent years have seen an increasing interest 
in using ontologies in data-intensive applications, especially 
in the context of intelligent systems, the semantic web, and 
in data integration. A much studied scenario is that of an- 
swering queries over an incomplete database under the open 
world semantics, taking into account knowledge provided by 
an ontology 1 16 15 13] . We refer to this as ontology-based 
data access ( OBDA ). 

There are several important use cases for OBDA. A clas- 
sical one is to enrich an incomplete data source with back- 
ground knowledge, in order to obtain a more complete set 
of answers to a query. For example, if a medical patient 
database contains the facts that patient 1 has finding Ery- 
thema Migrans and patient2 has finding Lyme disease, and 
the ontology provides the background knowledge that a find- 



ing of Erythema Migrans is sufficient for diagnosing Lyme 
disease, then both patient 1 and patient2 can be returned 
when querying for patients that have the diagnosis Lyme 
disease. The same setup is used in query answering under 
ontologies in the semantic web. OBDA can also be used to 
enrich the data schema (that is, the relation symbols used 
in the presentation of the data) with additional symbols to be 
used in a query. For example, a patient database may contain 
facts such as patient 1 has diagnosis Lyme disease and pa- 
tient2 has diagnosis Listeriosis, and an ontology could add 
the knowledge that Lyme disease and Listeriosis are both 
bacterial infections, thus enabling queries such as "return 
all patients with a bacterial infection" despite the fact that 
Bacteria I Infection is not part of the data schema. Especially 
in the bio-medical domain, applications of this kind are fu- 
eled by the availability of comprehensive professional on- 
tologies such as Snomed CT and FMA. A third prominent 
application of OBDA is in data integration, where an on- 
tology can be used to provide a uniform view on multiple 
data sources |35|. This typically involves mappings from 
the source schemas to the schema of the ontology, which we 
will not consider here. 

We may view the actual database query and the ontology 
as two components of one composite query, which we call an 
ontology-mediated query. OBDA can then be described as 
the problem of answering ontology-mediated queries. The 
database queries used in OBDA are typically unions of con- 
junctive queries, while the ontologies are typically specified 
in an ontology language that is either a description logic, or, 
more generally, a suitable fragment of first-order logic. For 
popular choices of ontology languages, the data complex- 
ity of ontology-mediated queries can be CONP-complete. 
For this reason, there has been extensive research on find- 
ing tractable classes of ontology-mediated queries, as well 
as on finding classes of ontology-mediated queries that are 
amenable to efficient query answering techniques |[14] |25] 
28 J. In particular, restricted classes of ontology-mediated 
queries have been identified that admit FO-rewriting, that 
is, for which there is an equivalent first-order query, or, al- 
ternatively, that admit a datalog-rewriting. FO-rewritings 
makes it possible to answer ontology-based queries using 
traditional database management systems. This is consid- 
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ered one of the most promising approaches for OBDA, and 
is currently the subject of significant research activity, see 
for example 1 15 24 26 27 1. Ontology languages that admit 



FO-rewritings include DL-Lite |jT5 1 and ontology languages 
that admit datalog-rewritings include fragments of Datalog* 
|[T2][B 3|. 

The main aims of this paper are (i) to characterize the ex- 
pressive power of ontology-mediated queries, both in terms 
of more traditional database query languages and from a de- 
scriptive complexity perspective, (ii) to make progress to- 
wards complete and decidable classifications of ontology- 
mediated queries, with respect to their data complexity, 
as well as with respect to FO-rewritability and datalog- 
rewritability. 

We take an ontology-mediated query to be a triple 
(8,0,(7) where S is a data schema, O an ontology, and q 
is a query. The data schema fixes the set of relation sym- 
bols than can occur in the data. The ontology O is a logical 
theory that may use the relation symbols from S as well as 
additional symbols. As ontology languages, we consider a 
range of standard description logics (DLs). We also con- 
sider as ontology languages the guarded fragment (GF), the 
unary negation fragment (UNFO), and the guarded negation 
fragment (GNFO), which are large fragments of first-order 
logic that embed many ontology languages such as DL-Lite 
|[15| and guarded tuple-generating dependencies y_2J. The 
query q can use any relation symbol that occurs in S or C 
As query languages for q, we focus on unions of conjunctive 
queries (UCQs) and unary atomic queries (AQs). The latter 
are of the form A{x), with A a unary relation symbol. They 
correspond to what are traditionally called instance queries 
in the OBDA literature (note that A may be a relation sym- 
bol from O that is not part of the data schema). These two 
query languages are the two most used query languages in 
OBDA. The semantics of ontology-mediated queries is for- 
mally defined in terms of certain answers. In the following, 
we use (£, Q) to denote the query language that consists 
of all ontology-mediated queries {S,0,q) with O specified 
in the ontology language C and Q specified in the query 
language Q. For example, (GF,UCQ) refers to ontology- 
mediated queries in which O is a GF-ontology and g is a 
UCQ. We refer to such query languages (£, Q) as ontology- 
mediated query languages (or, OBDA languages). 

In Section |3] we characterize the expressive power of 
OBDA languages in terms of natural fragments of (negation- 
free) disjunctive datalog. We first consider the basic de- 
scription logic ACC. We show that (ACC,l]CQ) has the 
same expressive power as monadic disjunctive datalog (ab- 
breviated MDDlog) and that (ACC,AQ) has the same ex- 
pressive power as unary queries defined in a syntactic frag- 
ment of MDDlog that we call connected simple MDDlog. 
Similar results hold for various description logics extend- 
ing ACC with, for example, inverse roles, role hierarchies, 
and the universal role, all of which are standard opera- 
tors included in the W3C-standardized ontology language 



OWL2 DL. Turning to other fragments of first-order logic, 
we then show that also (UNFO,UCQ) has the same expres- 
sive power as MDDlog, while (GF,UCQ) and (GNFO,UCQ) 
are strictly more expressive and coincide in expressive power 
with frontier-guarded disjunctive datalog, which is the frag- 
ment DDlog given by programs in which, for every atom a 
in the head of a rule, there is an atom /3 in the rule body that 
contains all variables from a. 

In Sections |4] and |5] we study ontology-mediated queries 
from a descriptive complexity perspective. In particular, we 
establish an intimate connection between OBDA query lan- 
guages, constraint satisfaction problems, and MMSNP. Re- 
call that constraint satisfaction problems (CSPs) form a sub- 
class of the complexity class NP that, although it contains 
NP-hard problems, is in certain ways more computation- 
ally well-behaved. The widely known Feder-Vardi conjec- 
ture 1 20 1 states that there is a dichotomy between PTiME 
and NP for the class of all CSPs, that is, each CSP is either 
in PTiME or NP-hard. In other words, if the conjecture is 
correct, then there are no CSPs that are NP-intermediate in 
the sense of Ladner's theorem. Monotone monadic strict NP 
without inequality (abbreviated MMSNP) was introduced by 
Feder and Vardi as a logical generalization of CSP that en- 
joys similar computational properties | j20| . In particular, it 
was shown in pO] [29 1 that there is a dichotomy between 
PTiME and NP for MMSNP sentences if and only if the 
Feder-Vardi conjecture holds. 

In Section]?] we observe that (ACC , UCQ) and many other 
OBDA languages based on UCQs have the same expressive 
power as the query language coMMSNP, which we derive in 
a natural way from MMSNP by admitting free variables and 
applying complementation. In the spirit of descriptive com- 
plexity theory, we say that (^£C,UCQ) captures coMM- 
SNP. In fact, this result is a consequence of the results in 
Section |3] and the observation that MDDlog has the same 
expressive power as coMMSNP. It has fundamental conse- 
quences regarding the data complexity of OBDA query lan- 
guages and their query containment problem, which we de- 
scribe next. 

First, we obtain that there is a dichotomy between PTiME 
and CONP for ontology-mediated queries from {ACC,{]CQ) 
if and only if the Feder-Vardi conjecture holds, and similarly 
for many other OBDA languages based on UCQs. To ap- 
preciate this result, recall that the data complexity of query 
evaluation is a much studied subject in the OBDA litera- 
ture, and that considerable effort has been directed towards 
identifying tractable classes of ontology-mediated queries. 
Ideally, one would like to classify the data complexity of 
every ontology-mediated query within a given OBDA lan- 
guage such as (^£C,UCQ). Our aforementioned result ties 
this task to proving the Feder-Vardi conjecture. Significant 
progress has been made in understanding the complexity of 
CSPs and MMSNPs flT] |9) |30J, and the connection estab- 
lished in this paper facilitates the transfer of techniques and 
results from CSP and MMSNP to analyse the data com- 
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plexity of query evaluation in (^£C,UCQ). We also con- 
sider the standard extension ALCT of AUC with functional 
roles and note that, for query evaluation in (^£CJ^,AQ), 
there is provably no dichotomy between PTiME and CONP 
unless unless PTiME = NP. To also establish a counter- 
part of (GF,UCQ) and (GNFO,UCQ) in the MMSNP world, 
we consider guarded monotone strict NP (abbreviated GM- 
SNP) as a generalization of MMSNP; specifically, GMSNP 
is obtained from MMSNP by allowing guarded second-order 
quantification in the place of monadic second-order quan- 
tification, similarly as in the transition from MDDlog to 
frontier-guarded disjunctive datalog. In fact, the resulting 
query language coGMSNP has the same expressive power as 
frontier-guarded disjunctive datalog, and therefore, in par- 
ticular, (GF,UCQ) and (GNFO,UCQ) capture coGMSNP 
While it follows from our results in Section [3] that GM- 
SNP is strictly more expressive than MMSNP, we are able 
to show that GMSNP has a dichotomy between PTiME and 
NP iff the Feder-Vardi conjecture holds. Consequently, our 
observations regarding dichotomies for query evaluation ap- 
ply also to (GF,UCQ) and (GNFO,UCQ). 

The second application of the connection between OBDA 
and MMSNP concerns query containment. It was shown in 
pO) that containment between MMSNP sentences is decid- 
able. We use this result to prove that query containment is 
decidable for many OBDA languages based on UCQs, in- 
cluding (yl/:C,UCQ) and (GF,UCQ). Note that this refers to 
a very general form of query containment in OBDA, as re- 
cently introduced and studied in |8|. For (yl£CJ-',AQ), this 
problem (and all other decision problems discussed below) 
turns out to be undecidable. 

In Section [5] we study OBDA languages based on atomic 
queries and establish a tight connection to generalized con- 
straint satisfaction problems (generalized CSPs), that is, 
constraint satisfaction problems given by a finite collec- 
tion of structures. This connection is most easily stated for 
Boolean atomic queries (BAQs): we prove that (^£C,BAQ) 
captures the query language that consists of all Boolean 
queries definable as the complement of a CSP Similarly 
{ACC,AQ) with the universal role captures the query lan- 
guage that consists of all unary queries definable as the 
complement of a generalized CSP, which is given by fi- 
nite collections of structures enriched with a constant sym- 
bol. We then proceed to transfer results from the CSP lit- 
erature to the ontology-mediated query languages {ACC, 
BAQ) and {ACC, AQ). First we obtain that the existence 
of a PTiME/coNP dichotomy for these OBDA languages 
is equivalent to the Feder-Vardi conjecture. Then we show 
that query containment is not only decidable (as already fol- 
lows from the aforementioned connection with MMSNP), 
but, in fact, NExpTlME-complete. Finally, taking advantage 
of recent results for CSPs pT|[22l[T0) , we are able to show 
that FO-rewritability and datalog-rewritability, as properties 
of ontology-based queries, are decidable and NExpTlME- 
complete in the case of {ACC, AQ) and {ACC,^AQ). 



For simplicity, we focused on the description logic ACC 
when describing the results obtained in Sections |4] and |5] 
Making use of the equivalences between DL-based OBDA- 
languages established in Section [3] we also show that the 
same results hold for many extensions of ACC. 

Related Work A connection between query answering in 
DLs and the negation-free fragment of disjunctive datalog 
was first discovered and utilized in the influential | [34| |25| , 
see also |37|. This research is concerned with answer- 
preserving translations of ontology-mediated queries into 
disjunctive datalog. In contrast to the current paper, it does 
not consider the expressive power of ontology-mediated 
queries, nor their descriptive complexity. A connection be- 
tween OBDA based on DLs and CSPs was first found and 
exploited in |32|, in a setup that is different from the one 
studied in this paper. In particular, instead of focusing on 
ontology-mediated queries that consist of a data schema, an 
ontology, and a database query, |32| concentrates on ontolo- 
gies while quantifying universally over all database queries 
and without fixing a data schema. It establishes links to 
the Feder-Vardi conjecture that are incomparable to the ones 
found in this paper, and does not consider the expressive 
power and descriptive complexity of queries used in OBDA. 

2. Preliminaries 

Schemas, Instances, and Queries. A schema is a finite 
collection S = (5*1, . . . ,Sk) of relation symbols with as- 
sociated arity. A fact over S is an expression of the form 
S'(ai, . . . , a„) where S* e S is an n-ary relation symbol, and 
fli , . . . , a„ are elements of some fixed, countably infinite set 
const of constants. An instance 1) over S is a finite set of 
facts over S. The active domain adorn (S) of T) is the set of 
all constants that occur in the facts of S). 

A query over S is semantically defined as a mapping q 
that associates with every instance S) over S a set of answers 
qiD) C adom(2))", where n > is the arity of q. If n = 0, 
then we say that q is a Boolean query and we write — 1 
if e q{D) and = otherwise. 

A prominent way of specifying queries is by means 
of first-order logic (FO). Specifically, each schema S and 
domain-independent FO-formula Lp{xi, . . . , Xn) that uses 
only relation names from S (and, possibly, equality) give 
rise to the n-ary query q^p^s, defined by setting for all S- 
instances D, 

gy,s(®) = {(ai, ...,an)\D\= (p[ai, . . . , a„]}. 

To simplify exposition, we assume that FO-queries do not 
contain constants. We use FOQ to denote the set of all first- 
order queries, as defined above. Similarly, we use CQ and 
UCQ to refer to the class of conjunctive queries and unions 
of conjunctive queries, defined as usual and allowing the use 
of equality. AQ denotes the set of atomic query, which are 
of the form A(x) with A a unary relation symbol. Each of 
these is called a query language, which is defined abstractly 
as a set of queries. Besides FOQ, CQ, UCQ, and AQ, we 
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consider various other query languages introduced later, in- 
cluding ontology-mediated ones and variants of datalog. 

Two queries qi and q2 over S are equivalent, written qi = 
q2, if for every S-instance 2), we have — '72(S)- We 

say that query language Q2 is at least as expressive as query 
language Qi, written Qi ^ Q2, if for every query qi e Qi 
over some schema S, there is a query 52 G S2 over S with 
qi = q2. Qi and Q2 have the same expressive power if 

Ontology-Mediated Queries. We inti-oduce the fundamen- 
tals of ontology-based data access. An ontology language C 
is a fragment of first-order logic (i.e., a set of FO sentences), 
and an C-ontology O is a finite set of sentences from C. We 
introduce various ontology languages throughout the paper, 
including descriptions logics and the guarded fragment. 

An ontology-mediated query over a schema S is a triple 
(8,0,(7), where O is an ontology and q a query over 
S U sig(C'), sig(O) the set of relation symbols used in O. 
Here, we call S the data schema. Note that the ontology 
can introduce symbols that are not in the data schema. As 
explained in the introduction, this allows the ontology to en- 
rich the schema of the query q. Of course, we also do not 
require that every relation of the data schema needs to occur 
in the ontology. We have explicitly included S in the specifi- 
cation of the ontology-mediated query to emphasize that the 
ontology-mediated query is interpreted as a query over S. 

The semantics of an ontology-mediated query is given in 
terms of certain answers, defined next. A finite relational 
structure over a schema S is a pair *B — (dom,!D) where 
dom is a finite set called the domain of B and 2) is an in- 
stance over S with adom(S)) C dom. When S is under- 
stood, we use Mod(C') to denote the set of all finite rela- 
tional structures *B over S U sig(e') such that *B |= C Let 
(S, O, q) be an ontology-mediated query with q of arity n. 
The certain answers to q on an S-instance S) given O is the 
set certq,a(D) of tuples a C adom(S))" such that for all 
(dom,£>') e Mod(O) with S) C £»' (that is, all models of 
O that extend £>), we have a e 

Note that all ontology languages considered in this paper 
enjoy finite controllability, and thus finite relational struc- 
tures can be replaced with unrestricted ones without chang- 
ing the certain answers Q [5|. Every ontology-mediated 
query Q — {S,0,q) can be semantically interpreted as 
a query qq over S by setting qqiT)) — certg^c)(S)) for 
all S-instances T). Taking this observation one step fur- 
ther, every choice of an ontology language £ and query lan- 
guage Q gives rise to a query language, which we denote 
by (£. Q), defined as the set of queries q(s,o,q) with S a 
schema, O £ C, and q E Q a query over S U sig(O). We 
refer to query languages (£, Q) as ontology-mediated query 
languages (or, OB DA languages for short). 

Example 1 The left-hand side of Table [7] shows an ontol- 
ogy O that is formulated in the guarded fragment of FO. 
Consider the ontology-mediated query (S, O, q) with S = 
{ErythemaMigrans, LymeDisease, CancerlnFamily, finding. 





= T 




= ± 


A*(x) 


= A{x) 


{^cY(x) 




{CnD)''{x) 


= 0* AD* 


{3R.CY(x) 


= 3yR{x,y)AC*{y) 


{CUD)*{x) 


= 0*^0* 


(\/R.cy{x) 


= \lyR{x,y)^C''{y) 



Table 2: First-order translation of ^£C-concepts 

diagnosis, child} and q{x) the unary conjunctive query 

3y{ diagnosis(a;, y) A Bacteriallnfection(y) ). 

For the instance T) over S that consists of the facts 

finding(patl, janl2findl) Erythema I\/Iigrans(janl2findl) 
diagnosis(pat2, may7diag2) Listeriosis(may7diag2) 

we have certg (^(S)) = {patl, pat2}. 

Description Logics for Specifying Ontologies. In descrip- 
tion logic, schemas are generally restricted to relations of 
arity one and two, called concept names and role names, re- 
spectively. For brevity, we speak of binary schemas. We 
briefly review the basic description logic ACC. Relevant ex- 
tensions of ACC will be introduced later on in the paper. 
An ACC-concept is formed according to the syntax rule 

C, £> ::= A I T I _L I I C n D I C U D I 3R.C \ VR.C 

where A ranges of concept names and R over role names. 
An ACC-ontology O is a finite set of concept inclusions 
C 'O D, with C and D yi£C-concepts. We define the 
semantics of y^£C-concepts by translation to FO-formulas 
with one free variable, as shown in Table |2] An ACC- 
ontology O then translates into the set of FO-sentences 
O* = {yx.{C*{x) D*{x)) I C C e O}. On the 
right-hand side of Table [T] we show the y^£C-version of the 
guarded fragment ontology displayed on the left-hand side. 
Note that, although the translation is equivalence preserving 
in this case, in general the guarded fragment is a more ex- 
pressive ontology language than ACC. Throughout the pa- 
per, we do not explicitly distinguish between a DL ontology 
and its translation into FO. 

We remark that, from a DL perspective, the above defini- 
tions of instances and certain answers correspond to making 
the standard name assumption (SNA) in ABoxes, which in 
particular implies the unique name assumption. We make 
the SNA only to facilitate uniform presentation; the SNA is 
inessential for the results presented in this paper 

Example 2 Let O and S be as in Example |7] For 
qi{x) = Bacteria I Infection (a;), the ontology-mediated 
query (S, O, qi) is equivalent to the UCQ LymeDisease(.T)V 
Listeriosis(a;). For q2{x) — CancerlnFamily(a;), the 
ontology-mediated query (S, O, (72) is equivalent to the dat- 
alog program 



P{x) -i— CancerlnFamily(a;) 
P{x) ^ c\\M{y,x) hP{y) 

but not to any FOQ. 



goal(a:;) 



Fix) 
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\ix{ i3j/(finding(3S', ?/) A ErythemaMigrans{?/)) 


Elfinding. Erythema Migrans 


C 


^diagnosis. LymeDisease 


— > 3j/(diagnosis(a;, J/) A LimeDisease(j/)) ) 








\ix{ (LymeDisease(a:) V Listeriosis(x)) — > Bacteriallnfection(a;) ) 


LymeDisease U Listeriosis 


C 


Bacteria II nfection 


V3;( CancerlnFamily(a:) — > Vj/(child(x, y) — > CancerlnFamily(j/)) ) 


CancerlnFamily 


c 


Vchild. CancerlnFamily 



Table 1: Example ontology, presented in (the guarded fragment of) first-order logic and the DL ACC 



3. OBDA and Disjunctive Datalog 

We show that for many OBDA languages, there is a natu- 
ral fragment of disjunctive datalog that has exactly the same 
expressive power. 

A disjunctive datalog rule p has the form 

S'i(xi) V • • • V 5™(x™) ^ i?i(yi) A • • • A i?„(y„) 

withm > 0, ?i > 0. Wereferto S'i(xi)V- • •VS'm(x„j) as the 
head of p, and to i?i (yi), . . . , RniYn) as the body of p. Ev- 
ery variable that occurs in the head of a rule p is required to 
also occur in the body of p. Empty rule heads are denoted _L. 
A disjunctive datalog (DDlog) program 11 is a finite set of 
disjunctive datalog rules with a selected goal predicate goal 
that does not occur in rule bodies and only in goal rules of 
the form i?i(xi) A • • • A i?„(x„) — > goal(x). The arity ofU 
is the arity of the goal relation. Relation symbols that occur 
in the head of at least one rule of 11 are intensional (IDB) 
predicates of 11, and all remaining symbols in 11 are exten- 
sional (EDB) predicates. We use adom(a;) in rule bodies as 
a shorthand for "x is in the active domain of the EDB pred- 
icates"; as explained in the appendix, every DDlog program 
that contains adom(a;) can be translated into an equivalent 
one without an occurrence of this abbreviation. 

Every DDlog program 11 of arity n naturally defines an n- 
ary query qu over the schema S that consists of the EDB 
predicates of 11: for every instance D over S, we have 
gn(2)) = {a C adom(D)" | goal(a) e S' for all 
2)' e Mod(n) with £> C £»'}. Here, Mod (11) denotes the 
set of all instances over S' that satisfy all rules in 11, with S' 
the set of all IDB and EDB predicates in 11. Note that the 
DDlog programs considered in this paper are negation-free. 
Restricted to this fragment, there is no difference between 
the different semantics of DDlog studied e.g. in 1 17 1. 

A monadic datalog (MDDlog) program is a DDlog 
program in which all IDB predicates, with the possible 
exception of goal, are monadic. We use MDDlog to denote 
the query language consisting of all queries defined by a 
MDDlog program. 

3.1 Ontologies Specified in Description Logics 

We show that (^£C,UCQ) has the same expressive power 
as MDDlog and identify a fragment of MDDlog that has the 
same expressive power as {ACC, AQ). In addition, we con- 
sider the extensions of ACC with inverse roles, role hierar- 
chies, transitive roles, and the universal role, which we also 
relate to MDDlog and its fragments. To match the syntax of 
ACC and its extensions, we generally assume schemas to be 



binary throughout this sectionj^ 

(^£C,UCQ) and MDDlog. The first main result of this 
section is the following. 

Theorem 1 (ACC,UCQ) and MDDlog have the same ex- 
pressive power. 

Proof, (sketch) We first show how to translate an ontology- 
mediated query (S, O, q) G (^£CUCQ) into an equivalent 
MDDlog program. Let (S,0,g) be given. Let sub(O) be 
the set of subconcepts (that is, syntactic subexpression) of 
concepts that occur in O, and let c\{0,q) denote sub(O) 
extended with all CQ that have at most one free variable, use 
only symbols from q, and whose length is bounded by the 
length of q. A type (for O and q) is a subset of 01(0, q). We 
introduce a fresh unary relation symbol Pr for every type t, 
and we denote by S' the schema that extends S with these 
additional symbols. Call a relational structure 58 over S' U 
sig(C') type-coherent if Pr{d) E *B implies that (i) t = 
{C e cl(0, q) I *B h C[d]} and (ii) » h iff e t for 
all Boolean CQs q' e cl(0, q). 

Let k > 2 he any number greater or equal to the width 
of q, that is, the number of variables that occur in q. By a 
diagram, we mean a conjunction S{xi, . . . , a;„) of atomic 
formulas over the schema S', with n < k variables. A 
diagram S{x) is realizable if there exists a type-coherent 
*B e Mod(C') that satisfies 3x(5(x). A diagram 6{x) implies 
q{x'), with x' a sequence of variables from x, if every type- 
coherent *B e Mod(C') that satisfies 6{x) under some vari- 
able assignment, satisfies q{x') under the same assignment. 

In the MDDlog program that we aim to construct, the rela- 
tion symbols Pr are used as IDB relations, and the symbols 
from S are used as EBD relations. The MDDlog program 11 
consists of the following collections of rules: 

\J Pt{x) -i— adom(a;) 
TCci(o,g) _L ^ (5(x) for all non-realizable diagrams S{x) 
goal(x') -i— (5(x) for all diagrams S{x) that imply q{x') 

It is proved in the appendix that the MDDlog query qu is 
equivalent to {S,0,q). 

For the converse direction, let 11 be an MDDlog program. 
For each monadic IDB relation A of 11, we introduce two 
fresh unary relations, denoted by A and A. Let O be the 
yl£C-ontology that consists of all inclusions of the form 

T c {AuA)n^{AnA) 

'in fact, this assumption is inessential for Theoiems| 1 |and|3 [(which 
speak about UCQs), but required for Theorems |2]|4| and [5j( which 
speak about AQs) to hold. 
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and let q be the union of (i) all conjunctive queries that con- 
stitute the body of a goal rule, as well as (ii) all conjunctive 
queries obtained from a non-goal rule of the form 

Ai(xi) V • • • V A^{x,n) ^ i?i(yi) A • • • A i?„(y„) 

by taking the conjunctive query Ai{xi) A • • • A A„i(x„j) A 
RiiVi) A • • • A i?„(y„). It can be shown that the ontology- 
mediated query (S, O, q), where S is the schema consisting 
of the EDB relations of 11, is equivalent to the query defined 
by n. □ 

A.CC with Atomic Queries. We characterize (ACCAQ) 
by a natural fragment of MDDlog. As mentioned in the 
introduction, this query language has the same expressive 
power as (ACC,CNQ), where CNQ denotes the set of all 
ACC-concept queries, that is, queries C{x) with C a (pos- 
sibly compound) ^£C-concept. Specifically, each query 
(8,0,(7) e {ACC^CNQ) with q = C{x) can be expressed 
as a query (S, C, A{x)) £ {ACC, AQ) where A is a fresh 
concept name (that is, it does not occur in S U sig(C')) and 
C = O U {C c A}. 

Each disjunctive datalog rule can be associated with an 
undirected graph whose nodes are the variables that occur in 
the rule and whose edges reflect co-occurrence of two vari- 
ables in an atom in the rule body. We say that a rule is con- 
nected if its graph is connected, and that a DDlog program 
is connected if all its rules are connected. An MDDlog pro- 
gram is simple if each rule contains at most one atom i?(x) 
with R an EDB relation; additionally, we require that, in this 
atom, every variable occurs at most once. 

Theorem 2 (ACC,AQ) has the same expressive power as 
unary connected simple MDDlog. 

Proof, (sketch) The translation from (ACC, AQ) to unary 
connected simple MDDlog queries is a modified version of 
the translation given in the proof of Theorem [T] Assume 
that (S, O, q) with q = A{x) is given. We now take types 
to be subsets of sub(C') and then define diagrams exactly as 
before. The MDDlog program 11 consists of the following 
rules: 

\J Pt{x) adom(a;) 
TCsub(o) _L (5(x) for all non-realizable diagrams S{x) 
of the form Pt{x) 
or Pr, (xi) A (2:2) A S{x, y) 

goal(x) -s— A{x) 

Clearly, 11 is unary, connected, and simple. Equivalence of 
(8,0,(7) ™d (7n is proved in the appendix. 

Conversely, let 11 be a unary connected simple MDDlog 
program. It is easy to rewrite each rule of 11 into an equiva- 
lent y^£C-concept inclusion, where goal is now regarded as a 
concept name. For example, goal(a;) 4- R{x, y) is rewritten 
into 3R.T C goal and Pi{x) V P2(y) 4- R{x, y) A A{x) A 
B{y) is rewritten into A n 3R.{B n ^Fa) ^ P^. Let O be 
the resulting ontology and let q — goal(a;). Then the query 



qu is equivalent to the query (8, O, q), where 8 consists of 
the EDB relations in E. □ 

Note that the connectedness condition is required since one 
cannot express MDDlog rules such as goal (.x) -s— adorn (x) A 
A{y) with y ^ X in (ACC, AQ). Multiple variable occur- 
rences in EDB relations have to be excluded because pro- 
grams such as goal(a:) -s— A{x), _L 4— R{x,x) (return all 
elements in A if the instance contains no reflexive i?-edge, 
and return the active domain otherwise) can also not be ex- 
pressed in (ACCAQ). 

Extensions of A.CC. We identify several standard exten- 
sions of (ACC,\JCQ) and (ACC,AQ) that have the same ex- 
pressive power, and some that do not. We introduce the rel- 
evant extensions only briefly and refer to Q for more de- 
tails. ACCI is the extension of ACC in which one can state 
that a role name R is the inverse of a role name S, that is, 
yxy{R{xy) O S{yx)); ACCH is the extension in which 
one can state that a role name R is included in a role name S, 
that is, \/xy{R{xy) S{xy)); S is the extension in which 
one can require some roles names to be interpreted as a tran- 
sitive relation; ACC J- is the extension in which one can state 
that some role names are interpreted as partial functions; 
and ACCU is the extension with the universal role U, in- 
terpreted as dom x dom in any relational structure *B with 
domain dom. Note that U should be regarded as a logical 
symbol and is not a member of any schema. All these means 
of expressivity are included in the OWL2 DL profile of the 
W3C-standardized ontology language OWL2. 

We use the usual naming scheme to denote combinations 
of these extensions, for example ACCHI for the union 
of ACCH and ACCI and SHI for the union of S and 
ACCHI. The following result summarizes the expressive 
power of extensions of ACC. 
Theorem 3 

1. (ACCHIU,UCQ) has the same expressive power as 
MDDlog and as (ACC,UCQ). 

2. ( S, UCQ) and ( ACCJ-, UCQ) are strictly more expressive 
than (ACCUCQ). 

Proof. (Sketch) In Point 1, we start with (ACCIU, UCQ), 
for which the result follows from Theorem [6] below given 
that ACCIU is a fragment of UNFO. Role inclusions 
yxy{R{xy) — > S{xy)) do not add expressive power since 
they can be replaced by adding to the ontology the inclu- 
sions 3R.C' C 3S.C for all C G sub(O), and then replacing 
every atom S{x, y) with R{x, y) V S{x, y) in the UCQ. 

For Point 2, we separate (5,UCQ) from (^£C,UCQ) 
by showing that the following ontology-mediated query 
(8i,Oi,(7i) cannot be expressed in (^£C,UCQ): 81 con- 
sists of two role names R and S, Oi states that these role 
names are both transitive, and qi = 3xy{R{x, y) A S{x, y)). 
For (ACCF, \JCQ}, we show that (82, O2, (72) cannot be ex- 
pressed in (ACC ,\5CQ), where 82 consists of role name R 
and concept name A, O2 states that R is functional, and 
92 = A{x). □ 
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The following result is interesting when contrasted with 
Point 2 of Theorem [3] when {ACC,\JCQ) is replaced with 
(.4£C,AQ), then the the addition of transitive roles does no 
longer increase the expressive power. In fact, it is folklore 
that transitive roles can then be replaced similarly to the role 
inclusions in the proof above, see [34|. 

Theorem 4 (ACC,AQ) has the same expressive power as 

(sm,AQ). 

It follows from (3E] that this observation can be extended 
much beyond transitive roles. The addition of the univer- 
sal role on the side of OBDA query languages corresponds, 
on the MDDlog side, to dropping the requirement that rule 
bodies must be connected. For example, the MDDlog query 
goal(a;) adom(x) A A{y) can then be expressed using the 
ontology O = {3U.A C goal} and the AQ goal(a;). 

Theorems (ACCU,AQ) and (SniU,AQ) both have the 
same expressive power as unary simple MDDlog. 

We close this section with a brief discussion of Boolean 
atomic queries (BAQs), that is, queries of the form 3x.A{x), 
where A is a unary relation symbol. BAQs behave similarly 
to AQs, and one can show modified versions of Theorems |2] 
to Theorem|5]above in which AQs are replaced by BAQs and 
unary goal predicates by 0-ary goal-predicate, respectively. 

3.2 Ontologies Specified in First-Order Logic 

Ontologies formulated in description logic are not able to 
speak about relation symbols of arity greater than twoj^J To 
overcome this restriction, we consider the guarded fragment 
of first-order logic and the unary-negation fragment of first- 
order logic Q |39). Both generalize the description logic 
ACC in different ways. We also consider their natural com- 
mon generalization, the guarded negation fragment of first- 
order logic ||5|. Our results from the previous subsection 
turn out to generalize to all these fragments. We set out with 
considering the unary negation fragment. 

The unary-negation fragment of first-order logic (UNFO) 
|[39l is the fragment of first-order logic that consists of those 
formulas that are generated from atomic formulas, including 
equality, using conjunction, disjunction, existential quantifi- 
cation, and unary negation, that is, negation applied to a 
formula with at most one free variable. Thus, for example, 
-'3xyR{x, y) belongs to UNFO, whereas 3xy^R{x, y) does 
not. It is easy to show that every ACC-TQox is equivalent to 
a UNFO sentence. 

Theorem 6 (UNFO,UCQ) has the same expressive power 
as MDDlog. 

Proof. (sketch) The translation from MDDlog to 
(UNFO,UCQ) is given by Theorem [T] Here, we provide 
the translation from (UNFO,UCQ) to MDDlog. Let Q = 

^There are actually a few DLs that can handle relations of unre- 
stricted arity f 161. Those DLs are not considered in this paper, but 
could easily be treated using the methods introduced here. 



(S, O, q) e (UNFO,UCQ) be given. We assume that O is a 
single UNFO sentence that is in the normal form generated 
by the following grammar (which only generates formulas 
with at most one free variable): 

(j){x)::^T I -n(t>{x) \ 3y{'i{ji{x,y) A ■ ■ ■ A 'ipn{x,y)) 

where each ipi is either a relational atom or a formula in 
at most one free variable generated by the same grammar. 
Note that no equality is used. Easy syntactic manipula- 
tions show that every UNFO-formula with at most one free 
variable is equivalent to a disjunction of formulas generated 
by the above grammar. In the case of O, we may further- 
more assume that it is a single such sentence, rather than a 
disjunction, because certg e)iv02 (®) is the intersection of 
certq e)j(£l) and cert^.o^ (S), and MDDlog is closed under 
taking intersections of queries. 

Let sub(C') be the set of all subformulas of O with at most 
one free variable z (we apply a one-to-one renaming of vari- 
ables where needed to ensure that each formula in sub(O) 
with a free variable has the same free variable z). Let k be 
the maximum of the number of variables in O and the num- 
ber of variables in q. We denote by c\k{0) the set of all 
formulas x{^) of the form 

3y(V'i(a;,y)A---AV'n(2:,y)) 

with y = j/i, . . . ,?/,„, m < k, where each ipi is either a 
relational atom or is of the form x{x) or xiUi)^ for xi^) G 
sub(C'). A type t{x) is a subset of c\k{0), and we denote 
the set of all types by type(C'). 

We introduce a fresh unary relation symbol Pt- for each 
type t{x), and we denote by S' the schema that extends S 
with these additional relations. Call a structure *B over S' U 
sig(O) type-coherent if Pr{d) G *B iff t{x) = {0(a;) G 
clfc(O) I (adom(S)),£>) \= cj) [a]} for all types t{x) and 
elements d in the domain of *B. By a diagram we will mean a 
conjunction S{xi, . . . , a;,,„) with m < fc of atomic facts over 
the schema S'. Realizability and "implying g" are defined 
as in the proof of Theorem [T] It follows from |39| that it is 
decidable whether a diagram implies a query, and whether a 
diagram is realizable. The MDDlog program is defined as in 
the proof of Theorem[T| where now, in the first rule, r ranges 
over types in type(C'). 

With this construction we have that the MDDlog query 11 
is equivalent to qg. The proof is in the appendix. □ 

Next, we consider the guarded fragment of first-order logic 
(OF). It comprises all formulas built up from atomic formu- 
las, using the Boolean connectives and guarded quantifica- 
tion of the form 3x{a A (f>) and Vx(a — 0), where, in both 
cases, a is an atomic formula (a "guard") that contains all 
free variables of (p. To simplify the presentation of the re- 
sults, we will consider here the equality-free version of the 
guarded fragment (analogous results can be obtained for the 
guarded fragment with equality, but they are slightly more 
difficult to state). We do allow one special case of equal- 
ity, namely the use of trivial equalities of the form x = x 
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as guards, which is equivalent to allowing unguarded quan- 
tifiers applied to formulas with at most one free variable. 

It turns out that (GF, UCQ) is strictly more expressive than 
MDDlog. The proof of the following proposition is given in 
the appendix. 

Proposition 1 The Boolean query 

ffj there are ai, . . . ,a„,6, with n > 2, such that A(ai), 
Bifln), and P{ai,b,ai-f-i) for all 1 <i <n 

is definable in ( GF,UCQ) and not in MDDlog. 

As fragments of first-order logic, the unary-negation frag- 
ment and the guarded fragment are incomparable in expres- 
sive power They have a common generalization, which is 
known as the guarded-negation fragment (GNFO) |j6J. This 
fragment is defined in the same way as UNFO, except that, 
besides unary negation, we allow guarded negation of the 
form a A ^(p, where the guard a is an atomic formula that 
contains all the variables of <j). Again, for simplicity, we 
consider here the equality-free version of the language, ex- 
cept that we allow the use of trivial equalities of the form 
X = X as guards. As we will see, for the purpose of 
OBDA, GNFO is no more powerful than GF. Specifically, 
(GF, UCQ) and (GNFO, UCQ) are expressively equivalent 
to a natural generalization of MDDlog, namely frontier- 
guarded DDlog. Recall that a datalog rule is guarded if its 
body includes an atom that contains all variables which oc- 
cur in the rule p3) . A weaker notion of guardedness, which 
we will call heiefrontier-guardedness, inspired by f3','5l, re- 
quires that, for each atom a in the head of the rule, there is an 
atom f3 in the body of the rule such that all variables that oc- 
cur in a occur also in f3. We define a frontier-guarded DDlog 
query to be a query defined by a DDlog program in which 
every rule is frontier-guarded. Observe that frontier-guarded 
DDlog subsumes MDDlog. 

Theorem? (GF,UCQ) and (GNFO,UCQ) have the same 
expressive power as frontier-guarded DDlog. 

Theorem [7] is proved via translations from (GNFO,UCQ) 
to frontier-guarded DDlog and back that are along the same 
lines as the translations from (UNFO,UCQ) to MDDlog and 
back. In addition, we use a result from |6| to obtain a trans- 
lation from (GNFO,UCQ) to (GF,UCQ). 

4. OBDA and MMSNP 

We show that MDDlog captures coMMSNP and thus, 
by the results obtained in the previous section, the same 
is true for many OBDA languages based on UCQs. We 
then use this connection to transfer results from MMSNP to 
OBDA languages with UCQs, linking the data complexity 
of these languages to the Feder-Vardi conjecture and estab- 
lishing decidability of query containment. To extend these 
results to (GF,UCQ) and (GNFO,UCQ), we generalize MM- 
SNP by replacing unary SO-quantification with guarded SO- 
quantification. 



An MMSNP formula over schema S has the form 
3Xi ■ ■ ■ 3X„Va;i • • • Va;„i(p with Xi, . . . , Xn monadic 
second-order (SO) variables, xi, . . . , Xm FO-variables, and 
ip a conjunction of quantifier-free formulas of the form 
i/i = ai A • • • A Q!„ — >■ /?! V • • • V f5.yn, n,m> 0, where each 
ai is of the form Xi(x), i?(x) (with R G S), or a; = y, and 
each f3i is of the form Xi{x.). In order to use MMSNP as a 
query language, and in contrast to the standard definition, 
we admit free FO-variables and speak of sentences to refer 
to MMSNP formulas without free variables. To connect 
with the query languages studied thus far, we are interested 
in queries obtained by the complements of MMSNP for- 
mulas: each schema S and MMSNP formula $ with n free 
variables gives rise to a query 

9$,s(S) = {ae adom(S))" | (adom(S)), 2)) ^ $[a]} 

where we set (adom(S)), 3D) |= $ to true when J) is the 
empty instance (that is, adom(J)) = 0) and $ is a sentence. 
We observe that the resulting query language coMMSNP has 
the same expressive power as MDDlog; in particular, im- 
plications i9 — > _L in MMSNP formulas correspond to goal 
rules in MDDlog programs. 

Proposition 2 coMMSNP has the same expressive power as 
MDDlog. 

Thus, the characterizations of OBDA languages in terms of 
MDDlog established in Section|3]allow us to transfer results 
from MMSNP to OBDA. We start with considering the data 
complexity of the query evaluation problem: for a query q, 
the evaluation problem is to decide, given an instance 2) and 
a tuple a of elements from S, whether a e Our first 

result is that the Feder-Vardi dichotomy conjecture for CSPs 
is true if and only if there is a dichotomy between PTiME and 
CONP for query evaluation in (^£C,UCQ), and the same is 
true for several other OBDA languages. For brevity, we say 
that a query language has a dichotomy between PTiME and 
CONP, referring only implicitly to the evaluation problem. 

The proof of the following theorem relies on Proposition|2] 
and Theorems [T] [3] and [6] It also exploits the fact that the 
Feder-Vardi dichotomy conjecture can equivalently be stated 
for MMSNP sentences ||20| |29). Some technical develop- 
ment is needed to deal with the presence of free variables. 

TlieoremS (ALC,UCQ) has a dichotomy between PTiME 
and CONP iff the Feder-Vardi conjecture holds. The same is 
true for (ACCniU,UCQ) and (UNFO,UCQ). 

By Proposition [1] (GF,UCQ) and (GNFO,UCQ) are strictly 
more expressive than MDDlog and we cannot use Proposi- 
tion|2]to relate these query languages to the Feder-Vardi con- 
jecture. Theorem [7] suggests that it would be useful to have 
a generalisation of MMSNP that is equivalent to frontier- 
guarded DDlog. Such a generalization is introduced next. 

A formula of guarded monotone strict NP (abbrevi- 
ated GMSNP) has the form 3Xi ■ ■ ■ 3X„Va;i • • • Vx,„iy9 with 
Xi, . . . , Xn SO variables of any arity, xi, . . . , x„ FO- 
variables, and (f a conjunction of formulas — ai A • • • A 
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— /3i V • • • V Pm, 71,171 > 0, where each Ui is of the form 
Xi{x), i?(x) (with R e S), or x ~ y, and each (5i is of the 
form Xi{x). Additionally, we require that for every head 
atom Pi, there is a body atom aj such that aj contains all 
variables from f3i. GMSNP gives rise to a query language 
coGMSNP in analogy with the definition of coMMSNP. It 
can be shown by a straightforward syntactic transformation 
that every MMSNP formula is equivalent to some GMSNP 
formula. This is part of the proof of the following lemma, 
which also relies on Proposition [T] and Theorem]?] 

Proposition 3 coGMSNP has the same expressive power as 
frontier-guarded DDlog and is strictly more expressive than 
coMMSNP. 

Note that, although defined in a different way, GMSNP is es- 
sentially the same logic (and has the same expressive power) 
as MMSNP2 from |33|. There, it was left open to prove 
that MMSNP2 is more expressive than MMSNP, which is 
resolved by Proposition [3] Another relevant result that we 
add for GMSNP (equivalently: MMSNP2) is that this logic 
is computationally as well-behaved as MMSNP and CSP: 
although GMSNP is strictly more expressive than MMSNP, 
we can still find, for every coGMSNP query q, a coMMSNP 
query q' that is polynomially equivalent: evaluating q can be 
reduced in polynomial time to evaluating q' , and vice versa 
(by standard Karp reductions). Intuitively, q' is constructed 
so that it holds on the incidence graph of a structure when- 
ever q holds on the structure itself. 

Tlieorem 9 For every coGMSNP query, there is a coMM- 
SNP query that is polynomially equivalent (and vice versa). 

From Theorem [T] Proposition [3] and Theorem |9] we obtain 
the following. 

Corollary 1 (GF,UCQ) has a dichotomy between PTiME 
and CONP iff the Feder-Vardi conjecture holds. The same 
is true for (GNFO,UCQ). 

Recall that (ACCT,UCQ) and (5,UCQ) are two extensions 
of (^£C,UCQ) that were identified in Section[3]to be more 
expressive than (^£C,UCQ) itself. It was already proved 
in p2 | (Theorem 27) that, compared to ontology-mediated 
queries based on ACC, the functional roles of ACCJ- dra- 
matically increase the computational power This is true 
even for atomic queries. 

Theorem 10 ([ ,32] ) For every NP-Turing machine M, there 
is a query q in (ACC!F,AQ) such that the complement of 
the word problem of M has the same complexity as eval- 
uating q, up to polynomial time reductions. Consequently, 
(ACCT,AQ) does not have a dichotomy between PTiME and 
CONP (unless PTiME = NPj. 

We leave it as an open problem to analyse the computational 
power of (5,UCQ). 

There are other interesting results that can be transferred 
from MMSNP to OBDA. Here, we consider query contain- 
ment. Specifically, the following general containment prob- 
lem was proposed in ||8J as a powerful tool for OBDA: 



given ontology-based queries (8,0^,5^), i E {1,2}, de- 
cide whether for all S-instances 3D, we have cert^^ Oi (®) Q 
certg2,ci2 Applications include the optimization of 
ontology-based queries and managing the effects on query 
answering of replacing an ontology with a new, updated ver- 
sion. In terms of OBDA languages such as (ACC ,UCQ), 
the above problem corresponds to query containment in the 
standard sense: an S-query qi is contained in an S-query 
q2, written qi C q2, if for every S -instance D, we have 
91 (S) ^ 92(2))- Note that there are also less general (and 
computationally simpler) notions of query containment in 
OBDA that do not fix the data schema |jT6). 

It was proved in pOj that containment of MMSNP sen- 
tences is decidable. Since the translation underlying the 
proof of Theorem|9]preserves containment, the same is true 
for GMSNP. We thus obtain the following result for OBDA 
languages. 

Theorem 11 Query containment is decidable for all of 
the following OBDA query languages: (ACC,UCQ), 
(ACCHIU,UCQ), and (UNFO,UCQ), (GFUCQ), and 
(GNFO, UCQ). 

Note that this result is considerably stronger than those 
in 1 8 1, which considered only containment of ontology- 
mediated queries based {S,0,q) with q an atomic query 
since already this basic case turned out to be technically 
intricate. The case of CQs and UCQs was left completely 
open, including all cases stated in Theorem 1 1 



5. OBDA and CSP 

We show that OBDA languages based on AQs capture 
CSPs (and generalizations thereof) and transfer results from 
CSPs to OBDA languages. In contrast to the previous sec- 
tion, we obtain a richer set of results and often even worst- 
case optimal decision procedures. Recall that each finite re- 
lational structure *8 over a schema S gives rise to a con- 
straint satisfaction problem which is to decide, given a finite 
relational structure 21 over S, whether there is a homomor- 
phism from 21 to *B (written 21 — !• S). In this context, the 
relational structure *8 is also called the template of the CSP. 

CSPs give rise to a query language coCSP in the spirit 
of the query language coMMSNP used in the previous 
section. This language turns out to have exactly the same 
expressive power as (ACC,BAQ), where BAQ is the class 
of Boolean atomic queries. To also cover non-Boolean 
AQs, we consider two natural generalizations of CSPs. 
First, a generalized CSP is defined by a finite set T of 
templates, rather than only a single one The problem 
then consists in deciding, given an input structure 21, 
whether there is a template *B G such that 21 — > *8. 
Second, in a (generalized) CSP with constant symbols, both 

^^In fact, this definition is slightly different from the one used in 
|8|. There, containment is defined only over instances S that are 
consistent w.r.t. 0\ and O2, i.e., where there is at least one finite 
S-structure (dom, D') such that 2) C 2' and D' £ Mod(C'i)- 
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the template(s) and the input structure are endowed with 
constants p9] [T]. To be more precise, let S be a schema 
and c = ci , . . . , Cm a finite sequence of distinct constant 
symbols. A finite relational structure over S U c has the 
form (21, di, . . . , c?„j) with 21 a finite relational structure 
over 21 that, in addition, interprets the constant symbols c; 
by elements di of the domain dom of 21, for 1 < i < m. Let 
(2t, a) and (*8, b) be finite relational structures over S U c. 
A mapping h is a homomorphism from (21, a) to (*B,b), 
written (2t, a) — > (*B, b), if it is a homomorphism from 21 
to S and h{ai) = hi for 1 < i < to. A (generalized) CSP 
with constant symbols is then defined like a (generalized) 
CSP, based on this extended notion of homomorphism. 

We now introduce the query languages obtained from the 
different versions of CSPs, where generalized CSPs with 
constant symbols constitute the most general case. Specif- 
ically, each finite set of templates F over S U c with c = 
ci , . . . , Cm gives rise to an TO-ary query coCSP( J-") that maps 
every S-instance D to 

{de adom(2))™|V('B,b)G J":(S),d) (*B,b)}, 

where we view (£>, d) as a finite relational structure whose 
domain is adorn (S). The query language that consists of all 
such queries is called generalized coCSP with constant sym- 
bols. The fragment of this query language that is obtained by 
admitting only sets of templates F without constant symbols 
is called generalized coCSP, and the fragment induced by 
singleton sets F without constant symbols is called coCSP. 

Example 3 Selecting an illustrative fragment of Examples^ 
and^ letO = {CancerlnFamily C Vchild. CancerlnFamily} 
and S = {CancerlnFamily, child}. Moreover, let q2{x) = 
CancerlnFamily(a;) be the query from Example^ To iden- 
tify a query in coCSP with constant symbols that is equiva- 
lent to the ontology-mediated query (S, , 52), 1st B be the 
following template: 



child 



CancerlnFamily 



o 

child 



o 

child 



It can be shown that for all instances D over S and for 
all d e adom(S)), we have d G c&rtq^.oi'^) Wi'^^d) 7^ 
(*8, a) and thus the query coCSP{^) is as required. 

The following theorem summarizes the connections between 
OBDA languages with (Boolean) atomic queries, MDDlog, 
and CSPs. Note that we consider binary schemas only. 
Theorem 12 The following are lists of query languages that 
have the same expressive power: 

1. {ACCUAQ), (SniUAQ), unary simple MDDlog, and 
generalized coCSP with one constant symbol; 

2. (ACC,AQ), (S'HX,AQ), unary connected simple 
MDDlog, and generalized coCSPs with one con- 
stant symbol such that all templates are identical except 
for the interpretation of the constant symbol; 

3. (ACC,BAQ), (S'HI,BAQ), boolean connected simple 
MDDlog, and coCSP; 



4. (AZCU,BAQ), (SnXU,BAQ), boolean simple MDDlog, 

and generalized coCSP. 
Moreover, given the ontology-mediated query or monadic 
datalog program, the correponding CSP can always be con- 
structed in exponential time. 

Proof, (idea) The equivalences between OBDA languages 
and fragments of MDDlog have been proved in Section [3] 
We concentrate on a sketch of how (^£C,BAQ) can be trans- 
lated into coCSP, and vice versa. 

For a query (S, O, q) G (ACC,BAQ), we construct a CSP 
template *8 as follows. The domain consists of all subsets of 
r C sub(O) that are realized in some model of O, that is, for 
which there is a finite relational structure 21 G Mod(C') and 
a d in the domain of 2t such that d makes true precisely those 
concepts in sub(C') that are in r. The relation symbols are 
then interpreted in a maximal way such that B \= C* [t] iff 
C E T holds for all r in the domain of B and C G sub(C'). 

Conversely, for a CSP template B over schema S, we 
construct an ontology-mediated query {S,0,q) as follows. 
Choose a fresh unary relation symbol A and for every d in 
the domain of B, a fresh unary relation symbol Ad- Then set 
q ~ 3x.A{x) and 

O = {AdH ^Ad' ^A\d^d'}U 

{Ad n 3R.Ad' C A I R{d, d') ^ »} U 

{Ad n B C A I B{d) ^ «8} U 

{TC U Ad} n 

dedom(B) '-' 

Theorem [12] allows us to transfer results about CSPs to the 
setting of OBDA, which, given the recent progress in the 
study of CSPs, turns out to be very fruitful. We start with 
the Feder-Vardi conjecture, which, we show, can be equiv- 
alently stated for generalized CSPs with constant symbols. 
Theorem [12] thus yields the following: 

Theorem 13 (ACC,BAQ) has a dichotomy between PTiME 
and CONP iff the Feder-Vardi conjecture holds. The same is 
true for (S'HIU,AQ), and (SUIU,BAQ). 

Note that for {ACC,BhQ), the "if" direction already fol- 
lows from Theorem [8) but the converse direction does not. 
We now consider further interesting applications of Theo- 
rem 12 in particular to deciding query containment, FO- 



rewritability, and datalog rewritabihty. 

5.1 Query Containment 

In Section |4] we have established decidability results 
for query containment in OBDA languages based on 
UCQs. For OBDA languages based on AQs and BAQs, 
we even obtain a tight complexity bound. Given that query 
containment in coCSP is characterized by homomorphisms 
between templates, it is straightforward to show that 
query containment for generalized coCSP with constant 
symbols is NP-complete. Thus, Theorem [12] yields the 
following NExpTime upper bound for query containment 
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in OBDA languages. The corresponding lower bound is 
proved in the appendix by a non-trivial reduction of a 
NExpTlME-complete tiling problem. 

Theorem 14 Query containment in (ST-LIU,AQJBQ) is in 
NExpTime. It is NEXPTIME-Ziarii already for (ACC,AQ) 
and for {ACC,BAQ). 

It is a consequence of a result in [8 1 that query containment 
is undecidable for ACCJ-. We show in the appendix how 
the slight gap pointed out in Footnote [3] can be bridged. 

5.2 FO-Rewritability and Datalog-Rewritability 

One prominent approach to answering ontology-mediated 
queries is to make use of existing relational database 
systems or datalog engines, eliminating the ontology by 
query rewriting |[T5 18 1. Specifically, an ontology-mediated 
query (S, O, q) is FO-rewritable if there exists an FO-query 
over S that is equivalent to it and datalog- rewritable if 
there exists a datalog program over S that defines it. It 
is a consequence of |36 | that every FO-rewritable query 
in (SHIU, AQJBAQ) is datalog-rewritable as well, and 
this is the case if and only if it is equivalent to a UCQ. 
Example |2] illustrates that ontology-mediated queries are 
not always rewritable into an FO-query, and the same holds 
for datalog-rewritability. It is thus a central problem to 
decide, given an ontology mediated query, whether it is 
FO-rewritable and whether it is datalog-rewritable. We 
show that both problems are decidable and establish tight 
complexity bounds, making use of the CSP connection. 

On the CSP side, FO-rewritability corresponds to FO- 
definability, and datalog-rewritability to datalog-definability. 
Specifically, a S-query coCSP( J-") is FO-definable if there is 
an FO-sentence ip over S such that for all finite relational 
structures 21 over S, we have 21 ^ iff 21 *8 for all 58 
in Similarly, coCSP(J^) is datalog-definable if there ex- 
ists a datalog program 11 that defines it. FO-definability and 
datalog-definability have been studied extensively for CSPs, 
culminating in the following results. 

Theorem 15 Deciding, for a given finite relational struc- 
ture 05 without constant symbols, whether coCSP(^) is FO- 
definable is NP-complete S^Tj. The same holds true for 
datalog-definability /|22]/f] 



From Theorem 12 we thus obtain NExpTlME upper bounds 



for deciding FO-rewritability and datalog-rewritability of 
queries from (5'HX,BAQ). To capture the more important 
AQs rather than only BAQs, we show that Theorem p3] can 
be lifted, in a natural way, to generalized CSPs with constant 
symbols. For each finite relational structure *8 with constant 
symbols ci, . . . , c„, let us denote by the correspond- 
ing relational structure without constant symbols over the 
schema that contains additional relations Pi, ... , P„, where 

''An NP algorithm is implicit in |22|, based on results in j?], see 
also |10| . We thank Benoit Larose and Liber Barto for pointing this 
out. 



each Pi denotes the singleton set that consists of the element 
denoted by Ci . 

Proposition 4 For every set of homomorphically incompa- 
rable structures *Bi, . . . , *8„ with constant symbols, 

1. coCSP(^i,...,^n) is FO-definable iff coCSP(WJ is 
FO-definable for 1 < i < n. 

2. coCSP(^i, . . . , *B,J is datalog-definable iff coCSP(^1} 
is datalog-definable for 1 < i < n. 

Note that every set of structures *Bi, . . . , *B„ has a subset 
*8i, . . . , which consists of homomorphically incompa- 
rable structures such that coCSP(Si, . . . , *B„) is equivalent 
to coCSP(*8i, . . . , Now, a careful combination of 

Theorem [15] and Proposition|4]imply the following result. 

Theorem 16 FO-definability and datalog-definability of 
generalized CSP with constant symbols is NP-complete. 



From Theorem 12 we obtain a NExpTlME-upper bound 
for deciding FO-rewritability and datalog-rewritability in 
OBDA languages based on AQs and BAQs. The correspond- 
ing lower bounds are proved in the appendix using a reduc- 
tion from a NExpTlME-hard tiling problem (the same prob- 
lem as in the lower bound for query containment). 

Theorem 17 It is in NExpTime to decide FO-rewritability 
and datalog-rewritability of queries in (S'HIU,AQUBAQ). 
Both problems are NExpTlME-Ziarc/ for (ACC,AQ) and 
(ACC, BAQ). 

Modulo a minor difference in the treatment of instances that 
are not consistent (see Footnote [3]l, it follows from a result 
in |32J that FO-rewritability is undecidable for (A^CTAQ)- 
In the appendix, we show how to bridge the difference and 
how to modify the proof so that the result also applies to 
datalog-rewritability. 

Theorem 18 FO-rewritability and datalog-rewritability are 
undecidable for (ACCJ-,AQ) and (ACCJ-,BAQ). 

6. Conclusion 

For a wide range of OBDA languages, based on de- 
scription logics and other first-order fragments, we estab- 
lished a connection with CSPs and MMSNP, through dis- 
junctive datalog, leading to new results concerning tractabil- 
ity dichotomies, query containment, FO-rewritability and 
datalog-rewritability. Our results apply to UCQs and to 
atomic queries (equivalently, instance queries). 

Another query language frequently used in OBDA with 
description logics is conjunctive queries. The results in this 
paper imply that there is a dichotomy between PTiME and 
CONP for (ACCCQ) if and only if the Feder-Vardi con- 
jecture holds. We leave it open whether there is a natural 
characterization of (.4£C,CQ) in terms of disjunctive dat- 
alog. An interesting future research topic is to analyze FO- 
rewritiability and datalog-rewritability of ontology-mediated 
queries based on UCQs as a decision problem. It fol- 
lows from our results that this is equivalent to deciding FO- 
definability and datalog-definability of MMSNP formulas. 
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APPENDIX 

A. Proofs for Section |3] 

We start with commenting on the use of adom(a::) in rule 
bodies of DDlog programs H. Every rule p in 11 that men- 
tions adom in the body is an abbreviation for the set of rules 
that is obtained from p by replacing every atom adom (a;) 
with an atom i?(x), where R is an EDB relation of 11 and 
X is a tuple of variables that contains x once and otherwise 
only fresh variables that do not occur in H. Let H' be ob- 
tained from n by replacing all rules in this way. Exploiting 
the fact that queries enjoy an active domain semantics, it is 
not hard to show that 11 and 11' define the same queries. 

A.l Proofs for Section 131X1 

We remark that the direction "from (ACC,AQ) to 
MDDlog" of Theorem[T]is actually a consequence of Theo- 
rem |6] which makes a strictly more general statement. We 
still provide it here (and in the main paper) as a warmup 
for the proof of Theorem [6] As an extra bit of notation, we 
say that an assignment tt of elements of an instance to the 
variables of a CQ q is a match ofq in J) if £> satisfies q under 

TT. 

Theorem [l] (ACC, UCQ ) and MDDlog have the same ex- 
pressive power. 

Proof, (continued) We establish here the correctness of the 
translation from (ACC,AQ) to MDDlog. Let m be the arity 
of (S, O, q). We have to show the following. 

Claim. For all instances 2) over S and all a € adom(£>)™, 
we have a e certg_c)(£>) iff a S (?n(S)- 

"if". Assume that a ^ certqo{'D). Then there is a 
(dom',D') e Mod(e') such that S C 2)' and a ^ g(D'). 
For each b G adom (23), let /d{b) be the unique type realized 
at b in D'. Let D" be the instance that consists of the atoms 
in S) and an atom -P^(b) (b) for each b G adom(S)). It can be 
verified that 2)" is a model of 11. In particular, a ^ q{'S') 
implies a ^ 5 (33"), and thus the type-coherent model 23" 
of O witnesses that whenever a diagram (5(x) has a match 
TT in 23" and S{x) implies <z(x'), then 7r(x') 7^ a. Since 
goal(a)^23",a^gn(53). 

"only if". Assume that a ^ (jn(33) and let 23' G Mod(n) 
such that 23 C 23' and 23' does not contain goal (a). The 
first two rules of 11 ensure that for each a G dom(23), there 
is a unique type /i(a) such that P^(a)(a) G 23'. By the sec- 
ond rule of n, there is a model (doma, 23o) of O in which 
fi{a) is realized at a. We may assume that these models have 
disjoint domains. Let the relational structure (dom", 23") be 
obtained by first taking the union of (doma, T)a)aea<iom('S)^ 
and then adding to it all facts of 23. To prove that a ^ 
certq^a{D), it suffices to show that 

(i) (dom", 23") is a model of O and 

(ii) a ^ (j(S)"). 



For Point (i), let ^{d) be the unique type realized by d in 
(doma, 23a), for all d G doma. It is straightforward to show 
by induction on the length of C that for all concepts C G 
cl(0,g)nsub(0)andalldG dom", we have (dom", 23") (= 
C[d] iff C G fJ.{d). Since 01(0, q) by definition includes C 
and D whenever C C £) is in O, this implies Point (i) as 
desired. 

It thus remains to establish Point (ii). Assume to the con- 
trary that there is a disjunct q'{x') of q such that a G q'{T)"), 
that is, there is a match tt of (?'(x') in 23" such that 7r(x') = 
a. We define a diagram (5(x) based on the restriction of the 
original model 23' of II to the image of tt, as follows: (5(x) 
contains all atoms A{x) such that tt{x) G adom (23') and 
A{tt{x)) G 23" (where A might also be of the form P^) 
and all atoms R{x, y) such that tt{x), 7r(j/) G adom(23') and 
i?(7r(x), 7r(y)) G 23". Since S{x) is satisfied in 23' under tt 
and by the last rule of II, we can obtain the desired contra- 
diction by showing that S{x) implies (/'(x'). 

Thus, let (dom, 58) G Mod(O) be type-coherent and let 
r be a match of (5(x) in *8 such that r(x') = a. Consider 
the CQs: q^ is the restriction of q' to those variables that tt 
maps to elements of 23'; for each element a G adom (23') in 
the range of tt, let qa be the CQ obtained by first taking the 
restriction of q to those variables that tt maps to elements 
of doma, and then identifying all variables that tt maps to 
the same element (preserving the names of free variables). 
Clearly, each qa has at most one free variable, which tt maps 
to a. By construction of S{x), all conjuncts of qo also occur 
in (5(x) and thus go is satisfied in *B under r. Now consider 
a query qa, a G adom(23') in the range of tt. Since qa has a 
match in 23a (such that, if qa has a free variable, it is mapped 
to a) and 23a realizes the type /i(a), we have qa G /i(a). By 
construction of (5(x), we thus have -P^(a)(a;) is in (5(x) for 
every variable x with tt{x) = a (of which there is at least 
one). Since *B is type-coherent, we thus find a match Ta of 
qa in *B (such that, if qa has a free variable, Tq maps it to a). 
It is not hard to see that the matches r and Ta, a G adom (23') 
in the range of tt, can can be assembled into a match tt' of q' 
in S such that 7r'(x') = a. □ 

Theorem |2] fyl£C,A2j has the same expressive power as 
unary connected simple MDDlog. 

Proof, (continued) We establish here the correctness of 
the translation from (ACC,AQ) to MDDlog. That is, we 
show that, for every instance 23 and elements a G adom(23), 
we have a G certg_c'(23) if and only if a G gn(£')- 

"if". Assume that a <^ certq.e)(23). Then there is 
(dom, 23') G Mod(O) with 23 C 23' such that a ^ g(23'). 
For each b G adom(23), let /i(6) be the unique type realized 
at b in 23'. Let 23" be the instance that consists of the atoms 
in 23 and an atom P^(b)(&) for each a G adom(23). It can 
be checked that 23" is a model of II. Since goal(a) ^ M, 
a^9n(23)- 



13 



"only if". Assume that a ^ (?n(S) and let D' be a model 
of n with J) C J)' that does not contain goal(a). For each 
h e mod(I)), let be a type such that P^(b){b) e £>'■ 
Note that A ^ ii{a). For each b e mod(2)), let (dorrib, 
be a model of O in which the type is realized at b. 
We may assume that these models have disjoint domains. 
Let (dom",S)") be obtained by first taking the union of 
(domb, S)f,)bgaciom(2))> ^nd then adding to it all facts of 23. 
Note that a ^ We show that (dom", D") is a model 

of O. 

Let be the unique type realized by d in (dom„, Sq), 
for all d E doma. We show the following by induction on 
the length of ip: 

(*) For all concepts cp{x) € c\k{0) and d G dom", we have 
(dom",S)") ^[d] iff if e n{d). 

The base case, as well as the inductive step for the 
Boolean operators, is trivial. We will treat the case of 3R 
constructor (the argument for the Vi? constructor is similar). 
Thus let be of the form Bi??/^ ™d let d E doma. 

IfifE n{d), then (doma, D^) |= ip[d]. Since Da) C D", 
and using the induction hypothesis, we have that S" |= (p[d]. 

Conversely, suppose (dom", 23") satisfies (p{d), that is, 
there is an element e such that (dom", 23") satisfies R{d, e) 
■0(e). If e E dorria, the claim (*) follows immediately from 
the induction hypothesis. Otherwise, we must have that e E 
adom(23) and, by induction hypothesis, t/j E /i(e). It follows 
that 3R^jJ E /i(d), because otherwise Pfj,(d) A Pfj.{e) A R{d, e) 
would be an non-realizable diagram, and 11 would derive an 
inconsistency. Therefore, using again the induction hypoth- 
esis, we have that again (*) holds. □ 

Theorem|3l 

1. (ACC'H2U,UCQ) has the same expressive power as 
MDDlog and as (ACC,UCQ). 

2. ( S, UCQ) and ( ACCT, UCQ) are strictly more expressive 
than (ACCUCQ). 

To complete the proof of Theorem |3] we need to show 
that the queries from (5,UCQ) and {ACCT,\JCQ) indicated 
in the proof sketch cannot be expressed in {ACC,UCQ), or 
equivalently, MDDlog. We start by providing a means of 
identifying queries which cannot be expressed in MDDlog, 
using the notion of colored instances, defined as follows: 

Definition 1 Let S be a schema andC be a set of unary pred- 
icates (colors) {Ci, . . . , C„} disjoint from S. A C-colored 
S-structure is an S U C-structure (dom, 23) such that 

• For every d E dom, Ci{d) E D for some i; 

• IfCi{d) E 23, then Cj{d) ^ D for every j ^ i. 

23 is called a C-coloring of a S-structure 23' ;/23' is the S- 
reduct ofD'. 

Now for each k > 0,fix Ck with \Ck\ = k and Cfc H S = 0. 
Then a fc-coloring o/23 is simply a Ck-coloring ofD. 



We will also utilize the notion of forbidden pattern prob- 
lems from [ 42l[30l [9l, whose definition we recall here: 

Definition 2 Given a set J- of C-colored S-structures ( called 
forbidden patterns j, we define Forb(J^) as the set of all S- 
structures 23 such that there exists a C-coloring 23' of 23 
for which 7^ 23' for every ^ E J-. The forbidden pat- 
terns problem defined by T is to decide whether a given S- 
structure belongs to Forb(J-"). 

Analogously to coMMSNP, we can define a query lan- 
guage coFPP consisting of all those Boolean queries g_F g 
defined by 

g.F,s(2)) = l iff (adom(23),23) ^ Forb(J-) 

with T a set of C-colored S-structures. It follows directly 
from results in |42| that coMMSNP and coFPP have the 
same expressive power. Combining this result with Proposi- 
tion |2] we obtain the following: 

Proposition 5 coFPP and Boolean MDDlog have the same 
expressive power. 

We use Proposition[5]in the proof of the following lemma, 
whose purpose is to establish a sufficient condition for non- 
expressibility in MDDlog. 

Lemma 1 A Boolean query Q over schema S does not be- 
long to MDDlog if for every n^m > 0, there exist S- 
instances 23o andH)i with Q{'Do) = and (5(23i) = 1 such 
that for every m-coloring So o/(adom(23o), 23o), there ex- 
ists an m-coloring Q3i 0/ (adom(23i), 23i) such that from 
every n-element substructure 0/ *Bi there is a homomor- 
phism fo *Bo- 

Proof. Assume for a contradiction that the conditions of the 
lemma hold for every n, m > but that Q is equivalent to 
some query in MDDlog. Then, by Proposition [5] there is a 
set F of C-colored S-structures such that for all S-instances 
23, we have (5(23) = 1 if and only if (adom(23), 23) ^ 
Forb(J^). Let mo = |C| and no be the maximal number 
of elements in the domain of some ^ E J-. 

Take 23 and 23 1 satisfying the conditions of the lemma 
for Too, no. We use C for Cm„. As Q(23o) = 0, there exists 
a C-coloring So of (adom(23o), 23o) such that S' 7^ *Bo for 
every ^ E J-. It follows that there exists an C-coloring 03 1 
of (adom(23i), 23i) such that from every no-element sub- 
set of adom(58i), there exists a homomorphism from *8i 
to Sq. Since Q(23i) = 1, we know that there must exist 
some ^ E J- such that ^ ^ Si. As 3^ contains at most no 
elements, we can compose this homomorphism with the pre- 
vious homomorphism to obtain a homomorphism of into 
*Bo, contradicting the fact that (adom(23o), 23o) S Forb(J"). 

□ 

Using the preceding lemma, we can now prove that the 
queries mentioned in the proof sketch cannot be expressed 
in MDDlog. 

Lemma 2 There exist queries in (S,UCQ) which do not be- 
long to MDDlog. 
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Proof. Consider (S, O, q) where S = {R, S}, O asserts 
transitivity of R and S, and q = 3xy{R{x, y) A S{x, y)). 

We apply Lemma[T] Assume that m,n > are given. Let 
k = n — 1 and k' = m'^+^. Define Si and So as follows: 

• Di has elements e, / and ai, . . . , and bi, . . . ,bh and 
the atoms R{e,ai), R{ak, f) and _R(ai,ai+i) for 1 < 
i < k, and S{e,ai),S{ak, f) and 5(6^,61+1) for 1 < 
i < fc. 

• 2)o has elements ei, . . . , Cfe' and /i, . . . , /fe' as well as 
a{ , . . . , for 1 < :/ < fc' and fo^'-* , . . . , fejj'"' for 1 < 
j < i < k' . The atoms are R{ei,a\), R{al., fi) and 
i?(aj, aj^i) for 1 < i < fc' and 1 < j < fc. 

Moreover, 5(e,, 6^'^) for 1 < j < i < fc' and Sib]:^ Jj) 
fori < j < i < k' and 5(6*'-', J for 1 < Z < fc and 
1 < j < i < fc'. 

It is readily checked that Q(So) = and Q{Di) = 1, as re- 
quired. Let So be an m-coloring of (adom(2)o)j 2)o)- Since 
k = n — 1 and fc' = m'^+^ we find i, i' with i < i' such that 
the colorings of ei , , . . . , al , and e,/ , , . . . , o^' , /i' co- 
incide. Define an m-coloring of (adom(J)i), Si) by taking 
the coloring of ei,a\, ... ,a\, fi for e, ai , . . . , , / and the 

coloring of 6^'* , . . . , 6^'' for bi, . . . ,bk. Denote by 58i the 
resulting colored instance. 

Let C = {ci, . . . , c„} be a subset of Si with n elements. 
We define a function h from C to adom(f8o) as follows: 

• If e ^ C, then let h be the restriction of the following 
mapping to C: h{ai) = af , h{bi) = b]'' and h{f) = fi,; 

• If / ^ C, then let h be the restriction of the following 

mapping to C: h{ai) = a], h{bi) — b\'^ and h{e) — e^; 

• Otherwise there are a^^ , bj^ ^ C. Then let /i be the 
restriction of the following mapping to C: h{e) = ei, 
h{ai) — al for all I < io, h{ai) = a] for all I > io 
h{bi) = for all I, and h{f ) = 

It is easily verified that h is a homomorphism from C to So. 

□ 

Lemma 3 There exist queries in (ACCJ-,UCQ) which do 
not belong to MDDlog. 

Proof. Consider Q = {S,0,A{x)) where S = {S,A} and 
O states that S is functional. Set ©i = {S{a, b), S{a, c)} 
andSo = Note that <?q (Si) = 1 (since no model 

of O contains Si) and (7q(So) — 0. Pick n, m > 0, and let 
So be any m-coloring of (adom(So), So). We define an m- 
coloring Si of Si by assigning a, b the same colors as in So 
and giving c the same color as b. Then the mapping sending 
a to itself and &, c to 5 defines a homomorphism from Si to 
So. It follows that Q is not definable in MDDlog. □ 

Theorem |4] fyl£C,A2j has the same expressive power as 
(SUIAQ). 



Proof. It is folklore (see, for example, fSSj) that for every 
iSHX-ontology O there exists an y^£C-ontology C (possi- 
bly using additional concept names) such that (i) O' \= O 
and (ii) for every 21 g Mod(O) there exists a 2t' G Mod(C) 
with the same domain and interpreting the concept names 
of O in the same way as 21. It follows that (A£C,AQ) and 
{SHI,AQ) are equally expressive. □ 

Theorem^( ACCUAQ), (SHIU,AQ) both have the same 
expressive power as unary simple MDDlog. 

Proof. We start with the proof that {ACCU,hQ) and 
(SHIU ,AQ) have the same expressive power The prove 
is similar to the proof of Theorem |4] also with the uni- 
versal role it is known that for every iSHIZ^-ontology O 
there exists an ^£CZ^-ontology O' (possibly using addi- 
tional concept names) such that (i) O' \= O and (ii) for 
evei-y 2t G Mod(O) there exists a model 21' G Mod(C) 
with the same domain and interpreting the concept names of 
O in the same way as 21. It follows that (ACCU,AQ) and 
(SHIU ,AQ) are equally expressive. 

The translation from {ACCU,AQ) to unary simple 
MDDlog queries is a modified version of the translation 
given in the proof of Theorem [2] for the translation from 
(.4£C,AQ) to connected unary simple MDDlog queries. 

Assume that (S, O, q) with q = A{x) is given. We take 
types to be subsets of sub(C') and employ the same defini- 
tion of diagrams as in Theorem|2] 

The MDDlog program II consists of the following rules: 

\J Prix) adorn (x) 
TCsub(o) _L ^ S{x) for all non-realizable diagrams 6{x) 
of the form Pr{x), 

OrPr,{xi) A Pr2{x2), 

or Pr^ {xi) A Pr^ (X2) A S{x, y) 

goal(a;) A{x) 

Note that the only difference to the rules in the proof of The- 
orem|2]are the rules of the form 

L ^ Pr,{Xi) ^ Pr.Ax2) 

which are not connected. II is still unary and simple. Equiv- 
alence of (S, O, q) and gn can now be proved similarly to 
Theorem 121 

Conversely, let II be a unary simple MDDlog program. 
The rewriting of each rule of II into an equivalent ACCU- 
concept inclusion is again similar to the proof of Theorem|2] 
except that now one also has to concider non-connected bod- 
ies. They can be translated using the universal role. For ex- 
ample, 

Pi{x)y P2{y) ^ A{x) ^ B{y) 

is rewritten into A n 3U.{B n ^P2) C Pi. □ 

We briefly discuss Boolean atomic queries (BAQs); i.e., 
queries of the form 3x.A{x), where A is a unary relation 
symbol. BAQs behave similarly to AQs and one can show 
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modified versions of Theorems |2] to Theorem |5] above in 
which AQs are replaced by BAQs and unary goal predicates 
by 0-ary goal-predicate, respectively. 

Theorem 19 Theorems^to Theorem^hold if AQs are re- 
placed by BAQs and unary goal predicates by Q-ary goal- 
predicate, respectively. 

Proof. We show the required modifications to the proof 
of Theorem [2] The remaining results are proved by simi- 
lar modifications and left to the reader. For the translation 
from (^£C,BAQ) to boolean connected simple MDDlog the 
only difference to the program constructed in the proof of 
Theorem|2]is that goal(a;) A{x) is replaced by goal 
A{x). Conversely, for the translation from boolean con- 
nected simple MDDlog to (ACC,BAQ) we regard goal as 
a concept name and take BAQ 3x.g03\{x). Now the only 
additional difference is the rewriting of goal-rules. For ex- 
ample, goal R{x, y) is rewritten into 3i?.T C goal. 

□ 



A.2 Proofs for Section|33] 

Theorem|6](UNFO,UCQ) has the same expressive power as 
MDDlog. 

Proof, (continued) We establish here the correctness of 
the translation from (UNFO,UCQ) to MDDlog. That is, 
we show that, for every instance S and elements a — 
ai,...,a„ e adom(£i), we have a e certqoiJ!)) if and 
only if a g gn(S)- 

"if". Assume that a ^ certq^o{D). Then there is 
(dom,£>') e Mod(C') with DCS' such that a ^ 
For each a € adom(S)), let /i(a) be the unique type realized 
at a in 2)'. Let D" be the instance that consists of the atoms 
in J) and an atom P^((a)(a) for each a e adom(£)). It can 
be checked that S)" is a model of 11. Since goal(a) <^ M, 

"only if". Assume that a ^ (Zn(2)) and let S)' be a model 
of n with D C D' that does not contain goal(a). For each 
a G mod(2)), let ^{a) be a type such that Pfj,(a) («) G ™d 
let (doma, 23a) be a model of O in which /i(a) is realized at 
a (such a model must exist because otherwise the diagram 
Pti{a){x) would be non-realizable and 11 would include a 
rule _L ^ Pfi(a){x))- We may assume that these models 
have disjoint domains. Let (dom", S)") be obtained by first 
taking the union of (dorria, J)a)aGadom(2)). and then adding 
to it all facts of T). We show that 

(i) (dom", £>") is a model of O and 

(ii) a ^ <z(2)"). 

We start with the first claim. Let ii{d) be the unique type 
realized by d in (dom„, S)a), for all d € domQ. We show the 
following by induction on the length of (p: 



(*) For all formulas (p{x) G c\k{0) and d E dom", we have 
(dom", S") h ip[d] iff ^ G /i(d). 

Note that, since all types by definition include O, this im- 
plies (i). 

The base case, as well as the inductive step for formu- 
las of the form ^4'{x), is trivial. Thus let (p be of the form 
3y /\^tpi{x,y) and let d e doma. We may furthermore 
assume that ip is connected, meaning that the graph whose 
facts are the subformulas tpi and containing an edge between 
tpi and i/jj if they share a free variable, is connected. This is 
because, if ip is not connected, then the claim follows imme- 
diately from the analogous claims for each of the connected 
components. 

If p e p{d), then (doma, Da) h y^[d]. Since Sa) ^ 2)", 
and using the induction hypothesis for the ijji that are not 
atomic, we have that D" \— p[d]. 

Conversely, suppose (dom",S)") \= ipid], that is, 
(dom", S") satisfies /\j ipi{x, y) for some assignment tt of 
elements of dom" for the variables x, y such that it{x) — d. 

First assume that the image of tt is entirely contained in 
the domQ. Then (doma, Da) |= i^[d] (using the induction 
hypothesis), and therefore (doma, Da) H fid] as required. 
Now assume that this is not the case. By the connectedness 
assumption, the set / C adom(D) consisting of the elements 
that are in the range of tt is non-empty and contains a. In 
what follows we will define a number of formulas by syn- 
tactic operations on ip. It will follow from the definition of 
clfe(C') that each of these formulas again belongs to c\k{0), 
and hence, is subject to the induction hypothesis. Let ip' be 
obtained from tp by identifying all variables z, z' such that 
7r(z) = tt{z') E I. We assume that the free variable x re- 
tains its name. For each b E I, let zi, E yU{x} be the unique 
variable in ip' with 7r(zf,) = b. Let ip'^^ be the restriction of ip 
to those ipi containing only variables z with tt{z) in domf,, 
with free variable zi,. We have (dom", D") \= ip'i,[b] via the 
restriction of the match tt to the variables in ipi,, thus, by the 
earlier argument (since all witnessing elements are contained 
in domfc), E p{b). Let ipQ be ip', but with free variable Za 
instead of x. 

If it were the case that ip'g ^ /i(a) then the diagram 6 con- 
sisting of all facts over / in D would not realizable. Hence, 
by definition of 11 and the fact that D satisfies 11 we must 
therefore have ip'^ E /i(a) and thus there is a match tt' of 
ip'g in Da. We can thus find a satisfying assignment tt" of 
Ip' mapping x to d, such that the range of tt" lies entirely 
inside doma as follows: for all x with tt{x) in doma, set 
tt"{x) — tt{x); for all other x, set tt"{x) tt'{x). It can 
be verified that tt" is indeed a satisfying assignment of ip\ 
Moreover, 7r"(y) = d. Therefore, (doma, Da) H p[d] and 
hence p E p{d) as required. 

Finally, we show (ii) a ^ ^(D"), in a similar way. We 
suppose, for the sake of contradiction, that a E f/'(D) under 
some assignment tt for the existentially quantified variables 
in q. Let b be the the elements of adom(D) belonging to 
the range of tt. Then, in the same way as above, we can 
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decompose q into unary subqueries gb that are satisfied in 
the different subinstances Di, with e b, and conclude that 
qb{z) E for each b E h, and therefore the diagram con- 
sisting of all facts in S over a, b implies q, which contradicts 
the fact that 2) is a model of 11. □ 

Proposition [T] The Boolean query 

("fj there are ai, . . . ,a„,6, with n > 2, such that A{ai), 
B{an), and P{ai, b, ai^i) for all I <i <n 

is definable in ( GF,UCQ) and not in MDDlog. 

Proof. Let S consist of unary predicates A, B and a ternary 
predicate P, and let Q be the S-query defined by (f ). We 
first note that the Q can be expressed by the (GF,UCQ) query 

qs,o,BxU{x) where 

O = Vxyz {P{x,z,y) ^ {A{x) ^ R{z,x))) A 

Vxyz {P{x, z, y) {R{z, x) R{z, y))) A 
Wxyz iRix,y)^{Biy)^Uiy))) 

It thus suffices to show that Q cannot be expressed in 
MDDlog. We make use of the characterization of MDDlog 
queries in terms of fc-colorings provided by Lemma[T] 

Assume that m, n are given. Let k = to" + 2n. Define 
S-instances S)i and J)o as follows: 

• £>! has elements di, . . . ,dk,e and the atoms A{di), 
B{dk), and P{di, e, di+i) for 1 < z < fc. 

• Do has elements di, . . . , dk, and ei, . . . , e/c and the fol- 
lowing atoms: A{di), B{dk), and P{di,ej,di+i) when- 
ever l<i<k, l<j<k, and j ^ i. 

It is readily checked that Q{T)i) = 1 and Q{Do) = 0, as 
required. Let *Bo be a C-coloring of S)o, where C has m ele- 
ments. Define a C-coloring *8i of J) i by giving all elements 
of {di, . . . , dk} exactly the same color as in Sq. Choose i 
with n<i<k~n in such a way that for every sequence 
di, . . . , di+n with I > 1 and I + n < k there exists a se- 
quence di', . . . , di'+n with I' > 1 and I' + n < k such that 
the coloring of d; , . . . , di+„ coincides with the coloring of 
di', . . . , d/'+n and i ^ {/', I' + n}. Such an i exists since 
k > + 2n. Now give e the color of e^. One can now 
easily construct, for every n-element subset of *8i, a homo- 
morphism to Sq. □ 

Theorem|7](GF,UCQ) and (GNFO,UCQ) have the same ex- 
pressive power as frontier-guarded DDlog. 

Proof. The translation from frontier-guarded DDlog to 
(GNFO,UCQ) is straightforward: let 11 be an frontier- 
guarded DDlog query. Every rule of 11 whose conclusion 
is an IDB relation other than goal can be viewed as a GNFO 
sentence (over the schema consisting of all EDBs and IDBs 
other than the IDB relation goal). Note that frontier-guarded 
DDlog rules can indeed be viewed as a first-order formula 
that, after writing out the implication in terms of conjunc- 
tion and negation, belongs to GNFO. We simply take O to 



be the set of all rules of 11, viewed as a GNFO sentences, 
and we take q to be the UCQ consting of all bodies of rules 
whose conclusion contains the IDB relation goal. It is easy 
to verify that the ontology based query (8,0,^), where S is 
the schema consisting of all EDB relations, that is equivalent 
to the frontier-guarded DDlog query 11. 

Next, we explain how to translate (GNFO, UCQ) to (GF, 
UCQ). Let (S, q) be any ontology-mediated query, where 
is a GNFO-sentence and q = q{x) is a UCQ. It fol- 
lows from results in |j6) that there is a GF-sentence </)*, 
possibly containing additional relations Ri, . . . , i?„, and a 
Boolean UCQ q*, such that is equivalent to the second- 
order sentence 3Ri, . . . , Rnii'* A ^q*)- Consequently, we 
have that the ontology-mediated query (S, 0, g) is equiva- 
lent to the ontology-mediated query (S, (f>* ,q') where q'{x) 
is q{x) V {q* A adom(a;,i)) (which can be equivalently 

written as a UCQ). 

Finally, the translation from (GNFO, UCQ) to frontier- 
guarded DDlog is an adaptation of the translation from 
(UNFO, UCQ) to MDDlog that we gave earlier Recall that 
we used a specific normal form for UNFO sentences. For 
GNFO we can use an analogous normal form. Specifically, 
we can assume that O is generated by the following gram- 
mar: 

0(x) ::=T I a(x)A-0(x) | 3y(V;i(x,y)A- • •AV'n(x,y)) 

where each ipi is either a relational atom or a formula gen- 
erated by the same grammar The "guard" a is an atomic 
formula, possibly an equality, containing all variables in x. 

Let sub(C') be the set of all subformulas of O. Let k be the 
maximum of the number of variables in O and the number 
of variables in q. For £ > 0,we denote by 01^(0) the set of 
all formulas x(x) with x = Xi, . . . ,xe of the form 

3y(i/'i(a;,y)A---A?/'n(x,y)) 

with y = T/i , . . . , j/m, ™ + ^ < where each tpi is either an 
atomic formula or is of the form x(z) for some x E sub(C'). 
We apply a one-to-one renaming of variables where needed 
to ensure that each formula in sub^(C') has exactly the same 
free variables xi, . . . , Xi). A guarded £-type t{xi, . . . ,xi)is 
a subset of 01^(0) that contains at least one atomic relation 
(possibly equality) containing all variables xi,. . . ,Xi, and 
also containing the sentence O itself. We denote the set of 
all guarded ^-types by typei{0). Note that, by definition, 
there are no guarded £-types for £ greater than the maximal 
arity of a relation from S. 

We now proceed the same way as we did in the case of 
UNFO (but using guarded ^-types instead of unary types). 
We introduce a fresh £-aiy relation symbol P^ for each 
guarded f-type t{xi, . . . ,xi), and we denote by S' the 
schema that extends S with these additional relations. Di- 
agrams, realizability, and implying a query are defined in 
the same way as before in the case of UNFO. The DDlog 
program is also constructed in the same way, except that the 
first rule of the program is replaced by the following two sets 
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of rules: 



B. Proofs for Section |4] 



V 



P.(x) 

t(x) a guarded i-type 
with G T 

and 



i?(x) for each relation R of arity £ > 



±^P,(x)AP,.(y) 



for all guarded types t and r' (possibly of different arity) 
and tuples of variables x, y such that t(x) and T'(y) differ 
on the formulas in x n y they contain. 

The correctness proof is entirely analogous to the proof 
for UNFO as well. We establish here the correctness of 
the translation. That is, we show that, for every instance 
S) and elements a = ai,...,a„ e adom(£l), we have 
a e certq^e)(2)) if and only if a e qui'S)- 

"if". Assume that a ^ certq^o{D). Then there is 
(dom,£>') e Mod(O) with DCS' such that a ^ 
For every fact i?(b) of T), let fJ,{h) be the unique guarded £- 
type (for £ = |b|) realized at a in 2)'. Let S)" be the instance 
that consists of the atoms in T) and the atom Pfj,(a) (b) for 
each fact i?(b) in D. It can be checked that Ti" is a model 
of n. Since goal(a) ^ D", a ^ gn(2))- 

"only if". Assume that a ^ (7n(®) and let 2)' be a model 
of n with S) C 23' that does not contain goal(a). We say 
that a tuple b is "live" in 23 if S contains -R(b) for some 
relation symbol R. For each live tuple b of 23, let /i(b) be a 
guarded £-type (with £ = |b|) such that Pp(b)(b) G 23' and 
let (dorrib, 23b) be a model of O in which /i(b) is realized at 
b (such a model must exist because otherwise the diagram 
P^(b)(x) would be non-realizable and 11 would include a 
rule _L -s— P^(b)(x)). We may assume that for distinct live 
tuples b and c, domb and dorric overlap only (possibly) on 
{b} n {c}. Let (dom", 23") be obtained by first taking the 
union of (domb, 23b) for all live tuples b of 23, and then 
adding to it all facts of 23. We show that 

(i) (dom", 23") is a model of O and 

(ii) a ^ <z(23"). 

For all live tuples d of 23b, let /i(d) be the unique guarded 
£-type realized by d in (domb, 23b), for all d e domt,. Note 
that a tuple d may be live in 23b for several different choices 
of b, but the last set of rules that we added to the program 
n guarantees that the guarded £-type realized by d in each 
such (domb, 2)b) is the same. 

Claim (i) is proved by establishing the following, by in- 
duction on the length of ip: 

(*) For all formulas (p{'x.) e cl^.(O) and for each live €-tuple 
of 23", we have (dom", 23") |= (p[d] iS ip e ii{d). 

We omit proof of (*) and of (ii), which proceeds as in the 
proof of Theorem [T] □ 



In Section [Rl] we start with establishing a central tech- 
nical result about MMSNP extended with constant symbols 
which allows us to lift several results from MMSNP to MM- 
SNP with constants. In Section |B.2| we then provide the 
proofs for the results stated in Section [4] of the main paper. 
Here, the result from Section [BTT] is needed to deal with MM- 
SNP with free variables. 

B.l MMSNP with Constant Symbols 

For readability, throughout this subsection, we will adopt 
a more convenient notation for structures. If S is a schema 
and c a (possibly empty) set of constant symbols, then a 
S U c-structure *B will be given by a pair (dom(*B), •'^), 
where dom(*B) is a finite, non-empty set and is a func- 
tion assigning to each n-ary predicate in S an n-ary relation 
over dom(*8)and to each constant symbol c e c an ele- 
ment £ dom(*8). We use adom(*B) to denote the active 
domain of *B, and we call *8 an active domain structure if 
dom(<B) adom(«8). 

Our objective is to establish the following theorem, which 
lifts the containment and dichotomy results for MMSNP 
sentences |20| to coMMSNP queries: 

Theorem 20 coMMSNP has a dichotomy between PTiME 
and CONP ijf the Feder-Vardi conjecture holds. Contain- 
ment of coMMSNP queries is decidable. 



We prove Theorem 20 in several steps. We consider 



the language MMSNP with constant symbols (abbreviated 
MMSNPc), consisting of all sentences which can be ob- 
tained from MMSNP formulas by replacing each free vari- 
able by a constant symbol. The evaluation problem for 
MMSNPc consists in deciding whether an MMSNPc sen- 
tence with schema S and constant symbols c holds in a given 
S U c-structure *B. The containment problem for MMSNPc 
is to decide for two MMSNPc sentences ^fi, with rela- 
tions S and constants symbols c, whether *B |= V^i implies 
*B 1= ^2 for all S U c-structures <B. We use C \l/2 to 
denote containment. 

MMSNPc will serve as a bridge between coMMSNP 
queries (with free variables) and MMSNP sentences. More 
precisely, we will first show that evaluation of coMM- 
SNP queries is polynomially equivalent to evaluation of 
MMSNPc sentences, and show a polynomial reduction from 
coMMSNP query containment to containment of MMSNPc 
sentences. Afterwards, we will move from MMSNPc sen- 
tences to MMSNP sentences, again showing polynomial 
equivalence of the evaluation problems and a polynomial re- 
duction for containment. 

To link coMMSNP queries and MMSNPc, it will be ac- 
tually prove more convenient to suppose that MMSNPc sen- 
tences are interpreted over active domain structures, whereas 
to relate MMSNPc with plain MMSNP, we will wish to work 
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over arbitrary structures. Thus, as a preliminary step, we re- 
late the two variants of the MMSNPc evaluation and con- 
tainment problems. 

Lemma 4 Tlie evaluation problem for MMSNPc restricted 
to active domain structures is polynomially equivalent to the 
evaluation problem for MMSNPc (over general structures). 

Proof. Let $ = 3Xi ■ ■ ■ BX^Vxi • • • Vx„(^ be an MMSNPc 
sentence over schema S and constants c, which is interpreted 
over active domain structures. Pick a fresh second-order 
variable Y and a fresh constant c not appearing in c. Let (p' 
be the formula obtained from ip by replacing every conjunct 
a — > /3 by a (/3 V Y(c)). Let ip be the conjunction of 
all formulas of the form R{xi, . . . , Xk) — ^ ^Y{xi), where 
i? is a fc-ary relation in S, and Xi one of the variables among 
xi, . . . , Xk- Define a new MMSNPc sentence 

$' = 3Xi • • • 3Xi3Yyxi ■ --yxmiip' A t/i) 

We claim that the evaluation problem for $ over active do- 
main structures is polynomially equivalent to the evalua- 
tion problem for $' over general structures. The first re- 
duction is trivial since for every S U c-structure 21 such 
that dom(2t) — adorn (21), we have 2t |= $ if and only 
if 21 1= To see why, notice that ensures that Y is 
false everywhere on the active domain, so the additional dis- 
juncts have no effect. For the second reduction, we remark 
that *8 1= $' for a general S U c-structure *8 if and only 
if dom(*B) 7^ adom(*B) (since we can trivially satisfy $' 
by sending c to an element outside the active domain and 
including that element in Y) or dom(*B) — adom(*8) and 
*B h 

It remains to be shown that every evaluation problem for 
MMSNPc over general structures is polynomially equivalent 
to an evaluation problem for MMSNPc over active domain 
structures. Let $ be an MMSNPc sentence, and let $' be 
the sentence over schema S U {Elem} obtained from $ by 
replacing every conjunct a ^ f3 with set of terms T by a A 
Elem(t) -> /3. We claim that the evaluation problem 
for $ over general structures is polynomially equivalent to 
the evaluation problem for $' over active domain structures. 
For the first reduction, we have *8 |= $ if and only if 58' |= 
where *8' extends S by setting Elem'^ = dom(*B). For 
the other reduction, we have that for every SU{Elem}Uc- 
structure <B with dom(*8) = adom(<B), <B |== if and only 
if ^BEiem \= ^, where SEiem is the restriction of *B to Elem® . 

□ 

Lemma 5 Containment of MMSNPc over active domain 
structures is polynomially reducible to containment of 
MMSNPc (over arbitrary structures). 

Proof. Consider MMSNPc sentences $i, $2 with schema S 
and constants c. We apply the construction from Lemma |4] 
to obtain MMNSPc sentences and $2 with the property 
that S 1= $^ for a general S U c-structure 58 if and only if 
dom(<B) ^ adom(«8) ordom(5B) = adom(«8) andS ^ $i 



(for i e {1, 2}). It is readily verified that $1 C $2 for the 
class of active domain structures if and only if C $2- 

□ 

By the preceding lemmas, we can choose to work with ac- 
tive domain structures. It is then straightforward to relate the 
evaluation and containment problems for coMMSNP queries 
with the corresponding problems for MMSNPc sentences. 
Lemma 6 The evaluation problem for coMMSNP is poly- 
nomially equivalent to the evaluation problem for MMSNPc- 
Containment of coMMSNP queries is polynomially re- 
ducible to containment of MMSNPc sentences- 

The next step, and the core technical contribution of this 
subsection, is to relate the evaluation and containment of 
MMSNPc sentences to the analogous problems for MMSNP 
sentences. To simplify the technical constructions, it will 
prove convenient to work with forbidden pattern problems 
l|42l[30l|9|. 

We extend forbidden patterns problems to handle con- 
stant symbols, by simply substituting S U c-structures for 
S-structures in Definitions [T] and |2] We denote by FPPc the 
class of forbidden patterns problems thus defined, and use 
FPP to refer to the restriction to structures without constant 
symbols. Note that both FPPc and FPP define problems over 
structures, not instances (although this distinction is irrele- 
vant in the absence of constant symbols). 

It was shown in |42 1 that MMSNP sentences and FPP have 
the same expressive power. This result can be straightfor- 
wardly extended to handle constant symbols: 
Lemma 7 MMSNPc ond FPPc have the same expressive 
power (over structures with constant symbols). 

By the previous lemma and the fact that FPP is a subset 
of FPPc, to show polynomial equivalence of MMSNPc and 
MMSNP it suffices to show that every problem in FPPc is 
polynomially equivalent to some problem in FPP. To formu- 
late the reductions, we will require some additional notation 
and terminology, which we introduce next. 

Let S be a schema, c = {ci, . . . , c„} be a set of constant 
symbols, and P = {Pi, . . . , P„} be a set of unary predicates 
which do not appear in S. We will use Sc as an abbreviation 
for S U c and Sp in place of S U P. 

We define operations which allow us to transform Sp- 
structures into Sc-structures, and vice-versa. With every 
Sp-structure *8 with 7^ for all 1 < i < n, one can 
associate the Sc-structure *B^, called the collapse of *B, by 
factorizing through the P^, 1 < i < n. Namely, let ^ be 
the smallest equivalence relation satisfying d, d' E P^ ^ 
d ~ d'- Then dom(<B'=) is {[d] \ d e A'^}, where [d] de- 
notes the equivalence class of d w.r.t. ^. For convenience, 
when [d] = {d}, we will use d in place of [d]- Set cf = [d], 
for some d e P^ , and define R'^ in the obvious way such 
that the mapping g : d t-^ [d] is a S-homomorphism. 

For a Sc-structure 2t, we define the Sp-structure 2t which 
interprets the predicates in S in the same way as 21 and in- 
terprets the predicates in P as follows; P^ — {cf }. With 
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every Sc-structure *B, one can associate a finite set of finite 
Sp-structures, 03°'^, called its anti-collapse, such that the 
following two properties hold: 

1. for all Sp-structures a: «B ^- St'^ (and 2^= is defined) iff 
there exists «8' G S"'' such that «8' ^ 21. 

2. for all Sc-structures 21: *B ^ 21 iff there exists G ©'"^ 
such that <B' ^ 21. 

To employ the anti -collapse S"'^ for the reduction of FPPc 
to FPP, we require some properties from the construction of 
05''^ (cf. pages 43-45 of jij). The domain A®' of each 
<B' e <B°^ consists of {cf , . . . , c^} (the unnamed 

individuals in *8) together with the union Ui<i<n^j °f 
fresh non-empty (but possibly not mutually disjoint) sets 
Di,. . . ,Dn with PP' = Dj. Moreover, in Point 1 and 
Point 2 we have the following more detailed statement: 

(la) if h : ^ ^ (and 21'= is defined), and 5 : 21 ^ 21^= 
is the canonical homomorphism, then /i' : *8' — > 2t 
can be chosen in such a way that h'{d) = h{d) for all 
unnamed individuals d in S and h'{d) E g~^{cf ) for 
all d e A- 

(lb) if /i : ^ <B, then h' : ^ ^ can be defined such 
that h'{af') — af and h'(d) = h{d) if is not named. 

(2b) if /i : ^ 21, then /i' : S ^ 21 can be constructed by 
setting h'{d) = for all unnamed d. 

In what follows, we will be interested in colorings of S p- 
structures which respects the intuitive meaning of the pred- 
icates Pi. A C-coloring *B[C] of a Sp-structure 03 is said 
to be a uniform C-coloring of *B if for every 1 < j < A:, 
d, d' e P^ implies that d and d' have the same color in 
05 [C]. Given a set Q of C-colored Sp-structures, we define 
Forb""(t?) as the set of Sp-structures 21 such that there ex- 
ists a uniform C-coloring 2l[C] of 2t such that there exists no 
© e gwith© ^2l[C]. 

We are now ready to present the reduction from FPPc 
to FPP. Suppose that we are given a FPPj. problem de- 
fined by the set F of C-colored Sc-structures (where C = 
{Ti, . . . , Tfc}). We construct a set Q which contains all uni- 
form C-colored Sp-structures © such that 

• There exists J? G and a member ^' of the anti-collapse 
of the Sc-reduct of ^ such that © is the C-coloring of ^' 
defined as follows: 

(t) d g Tf iff d is unnamed in and d e Pf or there 
exists 1 < i < rt such that d e and af e Pf . 

(Note that we require that in the resulting structure P^ H 
Tj? = for j j', otherwise © is not in Q). 

It is easy to see that this construction guarantees that every 
© e g is such that i^® 7^ for every 1 < i < ?i. 

We let Qu = Q^U, where U is the set of all Sp U C- 
structures of the form {Pi (d) , Pi (e) , Pj {d),Pe{e)} with 1 < 
i < n and I < j < i < k. 

Notice that Forb""(g) = Forb(g„). 



Lemma 8 FPPc is polynomially equivalent to FPP. Specifi- 
cally: 

• For all Sp-structures 21, 21 G Forb(tJ.u) iff^'^ is undefined 
or 21= e Forb( J"); 

• For all Sc-structures 21, 21 G Forb(J^) iff^ e Forb(C/„). 

Proof. First let 2t is a Sp-structure such that 21 € Forb(tJu). 
There exists a uniform C-colored expansion 2l[C] of 21 such 
that there exists no & e G with © 2t[C]. Assume the 
collapse 21= is defined (i.e., Pf- ^ for 1 < i < n). We want 
to show 2t= e Forb(J^). By uniformity of 2t[C], we obtain 
a C-colored Sc-structure 21= [C] extending 2t= by setting d £ 
P^ [c] j^j: ^ unnamed and d G y^:^!'-] or d = af^ and 
p2t[c] ^ j,a[c] ^gg^JJJg fQj. 2 contradiction that h : ^ ^ 
21= [C] for ^ £ P. Then is a homomorphism from the Sc- 
reduct of 5 to the Sc-reduct 21= of 2l=[C]. By (la), we 
find g-' G iS'')"" and /i' : 5' ^ 21 such that /i'(d) = h{d) 
for all unnamed individuals d in ^ and ft,'(d) G g^^{af ) 
for all d E Di. Let S^'[C] be the C-coloring of ^' defined 
with (f). We obtain the desired contradiction by showing 
that h' is aSpU C-homomorphism from 5''[C] to 2t[C]. Let 
d G dom(3^') and d E Pj . If d is unnamed in ^, then 
d G if' ''^l implies that d G T/. Hence h{d) G Tf^''^' and 
h'{d) = h{d) G T/"'^'. If d G A, then d G t/'''^' implies 
a? G Tf , hence af G rf ^I'^' and P^I'^' C rf ['^l From 

h'{d) G g~i(af we cannot infer that h'{d) G if^''^', but 
only that there exists a sequence , . ■ . , Ai^ of predicates 
from {Ai, . . . , An} such that /i'(d) G Ai^, Ag^ = Pj, and 
Ai^ n Af^^^ 7^ for every 1 < k < Ip. By uniformity 

of 2t[Cl and Pf^^^ C Tf I'^', we obtain A^, C Tf' '^l, hence 

h'{d) G Tf ['^l. 

Conversely, if 21= is undefined, then 2t G Forb(C7„) since 
Pf ^ % for all © G and 1 < i < n, and so any 
uniform C-coloring of 21 will avoid ©„. Assume now that 
2t= G Forb(J^). There exists a C-colored expansion 2t=[C] 
of 21= such that there exists no 3^ G J" with ^ 21= [C]. 
We define a (uniform) C-colored expansion 21 [C] of 2t in the 
obvious way; let 5 : 21 — > 21= be the canonical mapping 
and set Pf^'^^ = g-^pf^'^^), for 1 < j < fc. Assume 
for a conti-adiction that © 21 [C] for © G Q. Then © 
is obtained from some ^ E P and some member ^' of the 
anti-collapse of the Sc-reduct of as described in (f). As- 
sume /i : © ^ 2l[C]. Then : 5^' ^ 2t and so, by (lb) 
there exists h' : 2t= that can be defined such that 

h'{af) — af- and h'{d) = h{d) if d is not named, where 
"S^ is the Sc-reduct of ^. We derive a contradiction by show- 
ing that h' a homomorphism from J to 21= [C]. First sup- 
pose that d G Pj', and d is unnamed in ^. Then d G Tj^, 

hence h'{d) = h{d) G T^ ''^l It follows that from the def- 
inition of Pf^^^ that h'{d) G P^'^'^y Next consider the 
case where a-f G T?. Then there must exist d such that 
d G T® and d G T^®. It follows that there is d' = h{d) with 
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d' € rf^^^ and d' € Pf-^'^\ The definition ofxf^'^^ together 

with g(d ) = flj ^ ' yields h (af) = €Tj ^ '. 

The second statement follows easily from the first, since 
for every Sc-structure 21, we have 21 — {^Y- □ 

Lemma 9 Containment of FFP^ is polynomially reducible 
to containment ofFPP. 

Proof. Consider Forb(J^i) and Forb(J^2)i both over S^. 
Let Qu.i and Qu,2 be the corresponding FPPs over schema 
Sp, which satisfy statements in Lemma [8] We claim that 
Forb(J-i) C Forb(J-2) iff Forb(e„,i) C Forb(g„,2). 

For the first direction, suppose that Forb(J^i) C 
Forb(J2)- Let 21 be a Sp-structure such that 2t G 
Forb(t/u^i). If 21 is undefined, then we immediately obtain 
21 € Forb(t?u.2). Otherwise, we have A'^ G Forb(J^i), and 
hence A"" G Forb(J"2) and A £ Forb(tJ„^2)- 

For the second direction, suppose that Forb(tJtj.i) C 
Forb(C/„ 2)1 and let *B be a Sc-structure such that *B G 
Forb(J^i). Then applying the previous lemma, we have 
*B G Forb(C/„ 1), hence *8 G Forb(C/„_2)- Again applying 
the lemma, we obtain *B G Forb(J^2)- Q 

By combining in a straightforward manner Lemmas |4] to 
|9] we obtain Theorem [20] 

B.2 Proofs for Section g] 

Lemma |2j coMMSNP has the same expressive power as 
MDDlog. 

Proof. Take a query q$ G coMMSNP with $ = 
3Xi ■ ■ ■ 3Xn^xi ■ ■ ■ yx,nip an MMSNP formula with free 
variables yi, . . . ,yh. We can assume w.l.o.g. that all impli- 
cations -0 = "1 A • • • A /?i V • • • V (3rn in 'fi satisfy 
the following properties: (i) n > and (ii) each variable that 
occurs in a atom also occurs in an atom. In fact, we can 
achieve both (i) and (ii) by replacing violating implications 
■0 with the set of implications tp' that can be obtained from ip 
by adding, for each variable x that occurs only in the head of 
ip, an atom 5(x) where 5 is a predicate that occurs in $ and 
X is a tuple of variables that contains x once and otherwise 
only fresh variables that do not occur in $ (c.f. the expan- 
sion of adom(a::) atoms in disjunctive datalog rules). Define 
an MDDlog program n<i> that consists of all implications in 
(f whose head is not _L plus a rule i) — goal(yi, . . . , yk) 
for each implication —?' J- in (p. It can be proved that 
q<i>,s — qni,,s for all schemas S. Finally, it is straightfor- 
ward to remove the equalities from the rule bodies in 11$. 

Conversely, let II be a fc-ary MDDlog program and as- 
sume w.l.o.g. that each rule uses a disjoint set of vari- 
ables. Reserve fresh variables yi,. . . ,yk as free variables 
for the desired MMSNP fomula, and let Xi,. . . , Xn be 
the IDB predicates in II and xi,. . . ,x,„ the FO-variables 
in n that do not occur in the goal predicate. Set <I>n = 
3Xi ■ ■ ■ BXnVxi ■ ■ ■ yxjnip where ip is the conjunction of all 
non-goal rules in II plus the implication ^ _L for each 



rule I? — > goal(x) in II. Here, is obtained from by re- 
placing each variable a; G x whose left-most occurrence in 
the rule head is in the i-th position with yi, and then conjunc- 
tively adding yi = yj whenever the z-th and j-th position in 
the rule head have the same variable. It can be proved that 
9n,s = Q'S'n.s for all schemas S. □ 

Lemma |3j coGMSNP is at least as expressive as frontier- 
guarded DDlog and strictly more expressive than coMM- 
SNP 

Proof. The proof of the first part follows the lines of the 
proof of Lemma [2] and is omitted. It thus remains to show 
that coGMSNP is strictly more expressive than coMMSNP. 
Note first that it is at least as expressive: we can convert 
any MMSNP formula into an equivalent one satisfying Con- 
ditions (i) and (ii) from the proof of Lemma |2] and clearly 
every such MMSNP formula is also a GMSNP formula. To 
see that coGMSNP is indeed strictly more expressive than 
coMMSNP, note that by Proposition [T] there is a (GF,UCQ) 
query q that is not expressible in frontier-guarded DDlog. 
By Lemma |2] and since MDDlog is a fragment of frontier- 
guarded DDlog, q is not expressible in coMMSNP; by The- 
orem [7] and the first part of Lemma [3] (7 is expressible in 
coGMSNP □ 

Theorem |9j For every coGMSNP query, there is a coMM- 
SNP query that is polynomially equivalent (and vice versa). 
Consequently, (GF,UCQ) and (GNFO,UCQ) have a di- 
chotomy between PTiME and CONP iff the Feder-Vardi con- 
jecture holds. 

In the following, we prove Theorem[9] By Lemma[3] for ev- 
ery coMMSNP query q, there is an equivalent coGMSNP 
query q' . It thus remains to prove that for every coGM- 
SNP query, there is a coMMSNP query that is polynomi- 
ally equivalent. For clarity, we consider here the truth of 
GMSNP and MMSNP formulas in finite structures, instead 
of the entailment of coGMSNP and coMMSNP queries in 
instances. In other words, we do not insist that the active do- 
main is identical to the domain and avoid the complements 
involved in coGMSNP and coMMSNP. A central ingredient 
to our construction are incidence graphs, defined as follows. 

Definition 3 Let S be a schema with relations of maximal 
arity k and fix a set V of n variables. An S-label is a set L 
of atoms Six) with S £ S and x.from V. We use var(L) to 
denote the variables in L. Let lab(S) denote the set of all 
S-labels. The incidence schema S, for S comprises a unary 
predicate elem, a binary predicate Px for every x £ V, and 
a unary predicate for every L £ lab(S), of the same name. 
An S-incidence graph is a structure 6 over the schema 
such that the following conditions are satisfied: 

L every element of satisfies at most one unary predicate; 

2. if{d,e) £ p®, then d ^ L'^ for all L £ lab(S) and 
e ^ elem®; 
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3. if{d, e) G pf and e G L®, then x E var(i); 

4. if{d,e)epfr\p^, thenx^ y. 

Let *B be an S-structure. For any subset B C dom(*B) of 
cardinality at most k, we fix an injection Sb ■ B V. 
*8 can be transformed into an S-incidence graph ©23 as fol- 
lows. 

• the elements of &<s are the elements of dom(S) and the 
facts in *8; 

• for all elements d of dom(*8), we have d G elem®'® ; 

• for all facts a G *B that involve elements B C dom(*B), 
we have 

- a G (ia)®'^ where is the S-label that con- 
sists of all atoms S{5B{di), . . . , Ssidm)) such that 
S{di,...,d„i) G <B and {di, . . . C B 

- (d, a) G (px)^'^ when(5_B(d) = x. 

Conversely, every incidence graph © can be transformed 
into an S-structure as follows: 

• the universe of ^Bg is elem®; 

• 5*^® with S of arity m consists of all tuples {di, . . . ,dm) 
for which there is an S-label L and an e G L® such that 
S{xi, ... , Xm) G L, di e elem®, and {di, e) G p®. for 
1 < i < m. 

Note that both transformations are polynomial when S is as- 
sumed to be of constant size. Also note that, for a given 
finite St-structure ©, it can be checked in polynomial time 
whether & is an incidence graph. 

Let $ = •• • 3X„Va;i • • • Vx„^ be a GMSNP for- 
mula over the schema S and with k free variables. We con- 
struct an MMSNP formula over the incidence schema Si 
with k free variables such that 

1. for every S-structure *B and all ai, . . . , ak G dom(*B), 
we have <B h ^K, • ■ ■ , Ofc] iff ©<b h *[ai, • • ■ > afe]; 

2. for every S^-structure © and all ai, . . . , Uk G dom(©), 
we have © |= ^'[ai, . . . , ak] iff ai, . . . , a/j G dom(*8e), 
© is an incidence graph, and \= $[ai, . . . , a^] (or 
*80 is empty). 

Before constructing the desired MMSNP formula 5*, let us 
briefly argue that this yields the intended result, i.e., that s 
is polynomially equivalent to (7*,Si- First assume that we 
are given an S-instance D and ai, . . . , G adorn (S)), to 
decide whether (ai, . . . , ak) G '?$.s(S))- We check whether 
S) is empty (then fc = and $ has no answer variables) 
and return false if this is the case. Otherwise, we com- 
pute ©2) (which then is an S^ -instance) and return the an- 
swer of {ai, . . . ,ak) G 'Z*,s,(®s>)^ which is correct by 
Point 1 above. Conversely, assume we are given an S^- 
instance © and ai, . . . , G adom(©), to decide whether 
(ai, . . . , ttk) G g*,Si (25)- Check whether © is an incidence 
graph and return true if this is not the case. Construct 03 0, 
which is an S-instance. If ai, . . . , are not in dom(*8e5), 
then return true. If 58g is empty (then k — and ^' has no 



answer variables), return false. Finally, return the result of 
(ai, . . . , ak) G g$,s(S(»)- All relevant steps can be imple- 
mented in polynomial time. 

Now for the construction of Let S+ be the extension of 
S with all SO-variables in <i>, viewed here as relation sym- 
bols. An S+-label Hs a type for $ if it satisfies all impli- 
cations in $. We use tp("I>) to denote the set of all types 
for $. For each L G lab(S), we use tp^('l>) to denote the 
types t that are compatible with L, that is, L is exactly the 
restriction of t to the schema S. 

The desired MMSNP formula 5* has one existential SO- 
quantifier for each monadic SO-variable Pt with t G tp(<I>) 
and one universal FO-quantifier for each variable from 
{u, u} l+l y l+l V', where V is the set of variables fixed in 
Definition|3]and V' is a set of 2|$| variables, |$| the number 
of symbols needed to write $. The free variables of "if are 
exactly the free varibles of <i>, denoted free(\['). The body of 

consists of the following implications. 

All free variables are sent to domain elements, not to facts: 

• for all X G free(5') and L G lab(S), the implication 

L{x) _L. 

Condition 1 of incidence graphs is satisfied: 

• for all L G lab(S), the implication A elem(u) — _L. 

• for all L,L' G lab(S) with L 7^ L', the implication 

L{u)AL'{u) _L. 

Condition 2 of incidence graphs is satisfied: 

• for all X € V, the implication Px{u, v) A elem(u) — > _L. 

• for all X G y and L G lab(S), the implication ^^.(u, v) A 

L{u) _L. 

Condition 3 of incidence graphs is satisfied: 

• for all L G lab(S) and x £ V \ var(i), the implication 

Pxiu^v) A L{v) _L. 

Condition 4 of incidence graphs is satisfied: 

• for all distinct x,y G V, the implication px{u,v) A 

Py{u,v) -> _L. 

Via the predicates Pt, we label every element that repre- 
sents a fact with a unique type. We restrict ourselves to only 
those facts a that are "active" in the sense that they have 
at least one incoming p^j-edge for every x G var(L(Q!)). 
Specifically, act5(u), with 5 C V^, is an abbreviation for 
AxesiP^i^^'^) ^ elem(a::)); we put: 

• for every L G lab(S), the implication actvar(L)(u) A 
L{u)^ V Pt{u). 

*etpi,(*) 

• afl types t, t' G tp($) with t ^ t', the imphcation Pt(u) A 
Pt'{u)^±. 

We ensure that the assigned types are compatible with each 
other: 
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• for all types t, t' S tp($) and S{xi, . . . , Xn) G t, and all 
yi, ■ ■ ■ ,yn G V such that S{yi, . . . , yn) is not in t', the 
implication 

Pt{u) A Pt'(w) A actvar(t)(w) A actvar(t')(w) A 

elem(uj) Apa;i("j,") Apy,(ui,u) -> ± 

l<i<n 

• We view each implication in the body 1^9 of $ as a formula 
of the form q 1. with q a CQ in which S+ \ S-atoms 
(but not S-atoms) may occur negated. An S+-incidence 
graph <& is called proper if all labels from lab(S+) that 
occur in © are types for $. For an S+ -incidence graph © 
and a CQ q, we write © |= g if there is a match of q in 
QSg. The width of q, denoted width((7), is the number of 
variables in q. 

Let g — >■ _L be in and let xi, . . . , be the free vari- 
ables of ^' that occur in q. Moreover, let © be a proper 
S^-incidence graph with at most width (g) atom nodes 
and at most width (g) element nodes, and selected dis- 
tinct elements ai,...,a£ C elem^ such that *8e |= 
q[ai, . . . , a^]. Fix a function 6 that maps each element 
of © to a variable in such that ai , . . . , are mapped 
to the free variables xi, . . . ,xi and all other elements are 
mapped to universally quantified variables. We add the 
implication ?9 — > _L, where ■& is the conjunction of the 
following atoms: 

- for every element e e i®, L £ ^p{^), the atom 
PUSie)); 

- for every element d € elem®, the atom elem(i5(d)); 

- for every edge (d, e) G p®, the atomp.j;{5{d),S{e)). 

It is now possible to prove that, for the constructed MMSNP- 
formula ^, the desired properties stated in Points 1 and 2 
above are indeed satisfied. 

Theorem 21 Query containment is decidable for coGM- 
SNP. 

Proof. Let q<s>^.s and (7$2,s be coGMSNP queries for which 
we want to decide containment, and let n be their arity. 



By Theorem 20 it suffices to show that g^^.s C s iff 



9*1, s, ^ 9*2-8, > where "^i and ^1*2 are the MMSNP formu- 
las constructed for <i>i and $2 in the proof of Theorem|9] 

"if". Assume 9*1,3, ^ 9*2, s, ™d let D be an S-instance 
and (oi, . . . , a„) S (7$i.s(2)). The latter implies that S) is 
non-empty. By Point 1 from the proof of Theorem [9j we 
have (fli, . . . ,a„) G 9*1,8, (®2>) and ©xi is an S^-instance. 
Thus (fli, . . . , a,i) e 9*2,8, and one more application 
of Point 1 yields (ai, . . . ,a„) g g^^^sC®)- 
"only if". Assume 9$i,s C g^^ s and let © be an S^- 
instance and (ai,...,a„) e g^iS, (©)• Then © is non- 
empty. By Point 2 from the proof of Theorem |9] we get 
(ai, . . . , a„) G 9*2, (©) (and thus are done) if © is not an 
incidence graph or ai, . . . , a„ ^ adom(S)25). Otherwise, we 
must have (ai, . . . , a„) G 9$i,s(S©) and D© is non-empty. 



Since g$j,s C g^^, 8, we get (ai,...,a„)< G 9*2,8(2)«)- 
Another application of Theorem 2 yields (ai, . . . , a„) G 
9*2,8, (©) as required. □ 



C. Proofs for Section H] 



Theorem 12 In each case, the following query languages 
are equally expressive: 

• (ACCUAQ), (SniUAQ), unary simple MDDlog, and 
generalized coCSP with one constant symbol; 

• (ACC,AQ), (ST-LTyAQ), unary connected simple 
MDDlog, and generalized coCSPs with one con- 
stant symbol such that all templates are identical except 
for the interpretation of the constant symbol; 

• (ACC,BAQ), (SHIyBAQ), Boolean connected simple 
MDDlog, and coCSP; 

• (ACCU,BAQ), (SHIU,BAQ), Boolean simple MDDlog, 
and generalized coCSP. 

Moreover, given the ontology-mediated query or monadic 
datalog program, the correponding CSP can always be con- 
structed in exponential time. 

Proof. The equivalences between the OBDA languages and 
fragments of monadic disjunctive datalog have been proved 
already. 

We start by proving the equivalence of {ACCU,AQ) and 
generalized coCSP with one constant. We employ the nota- 
tion introduced in the proof of Theorem [T] Assume S, O, 
and A{x) are given, where O is an y^£CZ^-ontology. We as- 
sume that O ^ T I— A (otherwise d G qs,o,A(x){'^) for 
all S-instances S) and all d G adom(2)) and so the query 
defined by (S, O, A{x)) is trivial). Call a type r C sub(O) 
realizable for O if there exist 2t G Mod(C') and d in the 
domain of 21 such that C G r iff 2t |= C* [d] holds for all 
C G sub(O). Let C be the set of all maximal sets T of 
realizable types for O such that there exists 2t G Mod(C') 
satisfying exactly the types in T and such that A ^ t for 
at least one t G T. For a binary relation symbol R we call 
types Ti, T2 R-coherent there do not exists VR.C G ti with 
C ^ T2 and there do not exist SR.C G ti such that C G T2. 

With each T G C we associate the S-structure *8 with 
domain T and facts B{t'), B e t\ B £ S, and R{ti,T2), 
Ti, T2 i?-coherent and i? G S. Set 

T = {{^T,r)\TeTeC,A^T} 

Now one can show that for every S-instance S and d G 
adom(D), there exists (*8t,t) G T with {Tl,d) -J> 
(*Bt,t) iff d ^ qs.o,A(x)('^)- Thus, the query defined by 
(S, O, A{x)) is equivalent to the query defined by T. 

Conversely, assume that is a finite set of S-structures 
with one constant. Let (®, 6) G J-, take for every d in the 
domain dom(S) of S a concept name Aii not in S, let A be 



23 



a fresh concept name, and set 

0<s,b = {Ad C ^Ad' \d^d'}U 

{Ad n 3R.Ad' C -L I R{d, d') ^ <B} U 
{Ad n B c ± I B{d) ^ «8} u 

{TC U Ad,^Ab\=A} 

dedom(t8) 

One can show that for every S -instance £> and d E 
adom(D), ^ ('B,6) iff d ^ 9s,o,j,,.,a(.)(S). Now 

let O be the disjunction over all 0<s.b with (*B, b) E T . 
O can be expressed in ACCU, and so the query defined by 
(S, O, A{x)) is equivalent to the query defined by 

We now prove that {ACC,AQ) has the same expressive 
power as generalized coCSPs with one constant symbol such 
that all templates are identical except for the interpretation 
of the constant symbol. Assume S, O, and A{x) are given, 
where O is an A£C-ontology. We assume that O ^ T C 
A. Let T be the set of all types t that are realizable for O. 
Define the S-structure S with domain T and facts B{t'), 
B E t', B E S, and R{ti , T2), ti , t-i i?-coherent and i? G S. 
Set 

-F = {(»,t) |Ter,A^r}. 

Now one can show that for every S -instance £> and d E 
adom(2)): ^ (<B,r) iff d ^ Qs,o,A(a;)(S)- Thus, 

the query defined by (S, O, Aix)) is equivalent to the query 
defined by T . 

Conversely, assume that is a finite set of S U {c}- 
structures which coincide except for the interpretation of the 
constant symbol c. Take for every d in the domain dom(S) 
of S a concept name Ad not in S, let A be a fresh concept 
name, and set 

O = {Ad^ ^Ad, \d^d'}\J 

{Ad n 3R.Ad' C -L I R{d, d') ^ <B} U 
{Arf n B C _L I B{d) ^ «B} U 

{T C U Ad} U 

dedom(S) 

{ n -A,\=A} 

One can show that for every S -instance D and d E 
adom(£i), (£>,d) ^ (<B,fe) iff d ^ qs,o,A(a;)(2))- Thus 
(S, O, A{x)) expresses the same query as J^. 
Points 3 and 4 are proved similarly and left to the reader. 



Theorem 14 Query containment in (ST-LIU,AQ[JBQ) is in 



NExpTime. It is NEXPTIME-Zzart/ already for (ACC, AQ) 
and for {ACC,BAQ). 

Proof. We provide the proof of the lower bound. The proof 
is by reduction of a NExpTlME-hai-d 2" x 2"-tiHng prob- 
lem. An instance of this tiling problem is given by a natural 
number n > and a triple (T, H, V) with 1 a non-empty, fi- 
nite set of tile types including an initial tile T;„\t to be placed 



on the lower left corner, H C 1 x 1 a horizontal matching 
relation, and C T x T a vertical matching relation. A 
solution for the 2" x 2"-tiling problem for (T, V) is a 
map / : {0, . . . , 2" - 1} X {0, . . . , 2" - 1} ^ T such that 
/(O, 0) = Tinit, ifii,j)Jii + 1, j)) E H for all i < 2" - 1, 
and {fii,j)Jii,j + 1)) E V for all j < 2" - 1. It is 
NExpTlME-complete to decide whether an instance of the 
2" X 2"-tiling problem has a solution. 

For the reduction, let n > and (T, H,V) be an in- 
stance of the 2" X 2"-tiHng problem with T = {Ti, . . . , Tp}. 
We construct a schema S, two y^£C -ontologies Oi and O2, 
and a query E{x) with E a unary relation symbol such 
that {1,H,V) has a solution if and only if qs,Oi.E(x) ^ 

qs,02.E(x) if andonly if qs,Oi,3x.£;(x) ^ qs.023x.Eix)- 

We first define an ontology Q (for grid) which encodes 
the 2" X 2"-grid. To define G, we use role names x and 
y to represent the 2" x 2"-grid and two binary counters 
X and Y for counting from to 2" — 1. The coun- 
ters use concept names Xq, . . . , Xn-i,Xo, . . . , X„_i and 
Yq, . . . , Yn-i, Yq, . . . , Yn-i as their bits, respectively. 
Q contains the inclusions 

X, c ^x„ Y, c 

for i — 0, . . . , n — 1. Counters are relevant only if the con- 
cept 

Def = ( fl {x,ux,))ni fl (y,uy,)) 



0=1. .n-l 



0=l..ri-l 



is true. Q contains the following well-known inclusions stat- 
ing that the value of the counter X is incremented when go- 
ing to x-successors (and Def is true) and the value of the 
counter Y is incremented when going to y-successors (and 
Def is true): for A: = 0, . . . , n — 1, 

Def n PI x,n Pk 

j=Q..k-l 



where 

Pk = {Xk 
and 



yx.{Def ^ Xk))n{Xk -^yx.{Def ^ Xk)) 



Def n U X, C Qk 

j=0..k-l 

where 

Qfc = {Xk ^ Vx.(Def ^ Xk))n(Xk ^ V.T.(Def ^ X^)) 

and similarly for Y and y. Q also states that the value of the 
counter X does not change when going to y-successors and 
the value of the counter Y does not change when going to 
x-successors: for « = 0, . . . , n — 1, 

DefnX, C Vy.(Def ^ X,), DefnX, C Vy.(Def X,) 

and similarly for Y and x. In addition, Q states that when 
the counter X is 2" — 1, there is no x-successor (with Def) 
and if the counter Y is 2" — 1, there is no y-successor (with 
Def): 

Def n n • • • n X„_i C Va;.(Def _L) 
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and 

Def n Fo n • • • n y„_i C Vy.(Def ^ ±) 
This finishes the definition of Q. Define the schema 

Sg = {x,y,Xo,...,Xn^i,XQ,...,Xn-l}LI 
{Yq, . . . , K„_i, Fo, • • • , Yn-l}- 

We set O2 = G U {E ^ E} (the latter inclusion merely 
serves to ensure E is part of the schema of 02)- 

We now extend Q to another ontology t?*. In addition to 
the inclusions in Q, states that Tjnit holds at (0, 0): 

^Xo n • • • n -x„_i n ^Fo n • • • n ^y„_i c i-pit 

and that the tiling is complete on Def : 
DefC U T„ 

i—l..p 

Next, 5* states that if a tiling condition is violated, then a 
concept name E is true. For all i ^ j: 

Tr n c ^, 

for all {i,j)<^ H: 
and for all {i^j) <^V: 

T,; n 3y.T,- C i;. 
Finally, i? is propagated along x and y: 

3x.E C 3y.£; C E 
We set Oi = and show: 
Claim. The following conditions are equivalent: 

• the 2" X 2"-tiling problem for (1, if, F) has no solution; 

• <lSg,Oi,E(x) is not contained in qsg,02,E{x); 

• qsg,OiSx.E{x) is not contained in gse,02, 32:. £;(x)- 
Assume first that (T, H, V) admits no 2" x 2"-tihng. Define 
a Sg-instance as follows. We regard the pairs (i, j) with 
i < 2" — 1 and j < 2" — 1 as constants and let 

• xi{i,j),{i + l,j)) e Dg fori < 2" - 1 and 

• 2/((i,j),(j,J + l))eS)gforj<2"-l. 
We also set 

• Xk{i, j) e Dg if the fcth bit of i is 1, 

• Xk{i,j) e Dg if the fcth bit of i is 0, 

• Yk{i, j) e Dg if the kth bit of j is 1, and 

• Ffc(i, j) e S)g if the fcth bit of j is 0. 
Then 

• qsg,02,E{x){^g) = and 

• <lSg,023x.E{x)i'^g) = 

since Dg counts correctly, and hence is satisfiable w.r.t. 02- 
However, since (T, H, V) admits no 2" x 2" -tiling, it follows 
that 



• (0,0) e gss.Oi,£;(x)(2)e); 

• qSg,Oi,Bx.E{x){'^^g) = 1- 

We have proved Points 2 and 3. 

Conversely, assume that (T, H, V) admits a 2" x 2"-tiling 
given by / : {0, . . . , 2" - 1} X {0, . . . , 2" - 1} ^ T. We 
show that qsg,Oi,Bx.E{x){^) — for all Sg -instances D 
which are satisfiable w.rt. ©2. Then Points 2 and 3 are re- 
futed, as required. 

Assume D is satisfiable w.rt. 02- We define a model 
(dom,£>') of Oi with S)' 3 S) as follows: the domain of 
3D' coincides with adom(S)). Symbols from Sg are defined 
in J)' in exactly the same way as in J). To define the facts 
involving tile types Tk associate with every d G adorn (S) 
such that Def applies to d, the uniquely determined pair 
v{d) = given to the values of the counters X and Y by 
Def. Then set Tkid) £ 2)' iff f{vid)) = Tfc. Note that 2)' 
contains no facts involving E. It is readily checked that the 
resulting structure is a model of Oi. □ 

The following lemma reduces the problem of deciding 
FO-rewritability from generalized CSP with constants to 
generalized CSP without constants. 

Lemma 10 Let J- be a finite set of S U c-structures. The 
following conditions are equivalent: 

1. coCSP(J-) is FO-definable; 

2. coCSP(IF'^) is FO-definable; 

Proof. If coCSP(J-"^) is defined by a first-order sentence 0, 
then replacing every subformula of the form Pi [x) in by 
X = Ci yields a first-order sentence defining coCSP(J^). 

For the converse, we make use a characterization of FO- 
definability of generalized coCSPs with constants using fi- 
nite obstruction sets. Let be a finite set of S U c-structures. 
A set V of SUc-structures is an obstruction set for CSP(T) if 
for all S U c-structures D the following conditions are equiv- 
alent: 

• there exists S G J" such that 2) ^ *8; 

• there does not exist 2t G 2? such that 21 ^ S). 

It is known that, for any finite set of structures coCSP(J^) 
is FO-definable if and only if T has a finite obstruction set. 
This was shown in |40| for structures without constant sym- 
bols, and follows easily from results in |36 1 even for the case 
of structures with constants. Finally, it was shown in Propo- 
sition A. 2 (1) in 1 1 1 that if coCSP(J-") has a finite obstruction 
set, then so does coCSP( J"''). □ 

The following lemma reduces the problem of deciding FO- 
definability from generalized CSP without constants to CSP 
without constants. 

Lemma 11 Let J- be a finite set o/S U c-structures. 

• // coCSP(^) is FO-definable for all <B G J", then 
coCSP(J-) is FO-definable. 
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• Conversely, if all ^ G !F are mutually homomorphic ally 
incomparable, and coCSP(T) is FO-definable, then each 
coCSP(^), <B e J", is FO-definable. 

Proof. For Point 1 choose for every 05 e a FO-sentence 
(p<s such that (dom, D) \= (p<s iff 2) 7^ *B for all S-instances 
J). Let Lp be the conjunction over all (p<s with *8 G J^. Then 
(dom,£>) 1= V3 iff D 7^ <B for any <B e J" holds for all 
S-instances S), as required. 

To prove the other direction we require the notion of a 
critical obstruction: a S-structure 21 is called a critical ob- 
struction for CSPiG) iff 21 7^ <B for any » G ^ but for 
any proper substructure 21' of 21 there exists a S e such 
that 21' *8. It is readily checked that coCSP(tJ) has a fi- 
nite obstruction set iff there only exist finitely many critical 
obstructions for CSP(5). 

For Point 2 assume that all *8 G are mutually ho- 
momorphically incomparable and that coCSP(J-') is FO- 
definable. Assume for a proof by contradiction that 
coCSP(*Bo) is not FO-definable for some *8o S T. Then the 
set C of critical obstructions for CSP(So) is infinite. Let QSg 
be a substructure of *Bq such that no proper substructure of 
Q3o can be homomorphically mapped to any 03 S {^Bq}. 
It is readily check that the set C of disjoint unions 21 U Sq, 
21 e C, are critical obstructions forCSPCJ"). Thus coCSPCJ") 
is not FO-definable and we have derived a contradiction. 



We claim that, in fact, each S G has an obstruction 
set of treewidth k. We prove this claim by contraposition: if 
some *B e does not have an obstruction set of treewidth 
at most k, there is a structure 21 such that 21 7^ *B, while, at 
the same time, 05' — 2t implies 58' — )■ S for all structures 
*B' of treewidth at most k. Now, take 21' to be the disjoint 
union of 21 and *B. Then we have that 2t (here, we are 
using also the fact that T consists of homomorphically in- 
comparable structures). At the same time, *B' ^ 21 implies 
*B' — > *8 for all structures S' of treewidth at most k. There- 
fore, coCSP(J^) has no obstruction set of bounded treewidth, 
a contradiction. 

So far, we have shown that, for each *B € J^, coCSP(*B) 
has an obstruction set of bounded tree width. By Proposition 
A. 2 (1) in 1 1 1, we have that, for all structures 2t with constant 
symbols, if coCSP(2t) has an obstruction set of bounded 
treewidth, then coCSP(2t^) has an obstruction set of bounded 
treewidth too (although it is not explicitly stated, it can eas- 
ily be verified that the relevant construction used there pre- 
serves treewidth). Thus, we obtain that, for each ^ G JF, 
coCSP(Q3'^) has an obstruction set of bounded width. It was 
shown in [20 1 that, for any structure 21 without constant sym- 
bols, coCSP(2t) is datalog-definable if and only if 2t has an 
obstruction set of bounded tree-width. Therefore we have 
that, for each *B G J", coCSP(*B'=) is datalog-definable. 



Next, we move on the datalog-definability. 
Lemma 12 Let J- be a finite set of SU c-structures. 

1. If coCSP(%'^) is datalog-definable for all *B € J^, then 
coCSP(J-) is datalog-definable. 

2. Conversely, if all *B G are mutually homomorphically 
incomparable, and coCSP(J-) is datalog-definable, then 
each coCSP(^'^), *B G J^, is datalog-definable. 

Proof. We first prove the direction from 1 to 2. If 
each coCSP(*8'^) is datalog-definable, then, since datalog is 
closed under conjunction, we also have that coCSP(J^'^) is 
datalog-definable. Let 11 be a datalog program that defines 
coCSP(J^^). A datalog program 11' defining coCSP(J^) may 
be obtained as follows: let |c| = n. We fix a sequence of 
distinct fresh variables y = yi . . . j/n. We increase the arity 
of each IDB in 11, including the goal predicate, by n, we re- 
place every atom i?(x), with R an IDB, by i?(x, y), and we 
replace Pi{x) by a; = y^. The resulting Datalog program 11' 
defines the coCSP( J"). 

For the converse direction, we make use of a character- 
ization of datalog-definability in terms of obstruction sets 
of bounded treewidth. Recall from the proof of Lemma [TO] 
the notion of an obstruction set for a set of structures. Sup- 
pose that coCSP(JO is definable by a datalog program whose 
rules contain at most k variables. Then T has an obstruction 
set of treewidth fc, namely, the set of all canonical structures 
of non-recursive datalog programs obtained by unfolding the 
given datalog program finitely many times (a standard argu- 
ment). 



The above lemmas, together, establish Proposition]?} 



We now proceed with the proof of Theorem 16 



Proposition 6 FO-definability and datalog-definability are 
decidable in NP even when the input is a generalized CSP 
with constant symbols. 

Proof. Let a finite set of structures T = {5Si, . . . , Q3„} be 
given. It suffices to guess a subset J^' C T, and to verify 
that (i) coCSP(*B'^) is FO-definable (respectively, datalog- 
definable) for each *8 G J-', and (ii) for each *B G there 
is a *8' G J-"' such that *8 — > *8'. This can clearly be done in 
NP □ 

We now give the lower bound proofs for Theorem [T6] 

Lemma 13 It is NExpTlME-hard to decide FO- 
rewritability of queries in (ACC,AQ) and of queries in 
(ACCBAQ). 

Proof. We prove the lower bound and employ for the reduc- 
tion the same tiling problem as in the lower bound proof of 



Theorem 14 We also employ the ontologies constructed in 
the proof of Theorem [14] 

For the reduction, let n > and (T, H, V) be an instance 
of the 2" X 2"-tiling problem with T = {Ti, . . . , Tp}. We 
construct a schema S, an ^£C-ontology O and a query A{x) 
such that (T, H, V) has a solution if and only if qs.o.A{x) is 
FO-rewritable if and only if qs,o,3x.A{x) is FO-rewritable. 

We consider the ontology Q, its extension Q*, and the 
schema Sg from the proof of Theorem 14 To define O, 
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we take a fresh role name S and two concept names A and 
F and set 

O ^g*u {3S.E nE,EnFnA} 

and S = Sg U {S, F}. 

Claim. The following conditions are equivalent: 

• (T, H, V) admits no 2" x 2"-tiling; 

• Qs,o,A(x) is not FO-rewritable; 

• Qs,o,Bx.A{x) is not FO-rewritable. 

Assume that (T, H, V) admits no 2" x 2"-tiHng. qs,o.A{x) 
is not FO-rewritable iff there does not exist a finite set T) of 
SU{c}-structures (an obstruction set) such that the following 
conditions are equivalent for every S-instance J) and d E 
adom(D): 

I- de (?S,0,A(a;)(S)- 

2. there exists 21 G P such that (2t, a) -> (D, d). 

We show that no finite obstruction set exists. To this end, we 
define S-instances as the union of J)g and the facts 

F{ao), S{ao, ai), . . . ,S{am, (0, 0)). 

It is readily checked that 

• oo e qs,o,A{x}{'^m) for all to > 0; 

• oo ^ qs,o,A{x){^'m)^ where D'^ results from D,„ by 
removing some fact {ak, ak+i) from Dm- 

It follows immediately that no finite obstruction set exists. 
The argument for qs.o,Bx.A{x) is similar. 

Conversely, assume that {1, H, V) has a 2" x 2" -tiling 
given by / : {0, . . . , 2" - 1} x {0, . . . , 2" - 1} ^ T. 
We have to show that there exists an FO-formula ip{x) 
over S such that for all S-instances £> and d e adom(J)), 
(adom(D), 2) |= ip[d] iff e qs^oM^^)- 

Note that one can easily construct a first-order sentence 
ipg over Sg such that, for all Sg-instances 2), the following 
are equivalent: 

• £> is not satisfiable w.rt. Q; 

• (adom, D) |= ipg. 

We fix such a sentence (pg and show that the following are 
equivalent for every S-instance D: 

• (adom(D),2)) |= (pg; 

The direction from Point 1 to Point 2 is trivial. Conversely, 
assume that (adom(£l), S)) ^ (pg. Then D is satisfiable 
w.r.t. g. We define a model (dom, 2)') of O with D' D S) as 
follows. The domain of 23' coincides with adom(2'). Sym- 
bols from S are defined in S' in exactly the same way as 
in D. To define the facts involving tile types Tk, associate 
with every d E adom(D) such that Def applies to d, the 
uniquely determined pair v{d) = given to the values 
of the counters X and Y by Def. Then set Tk{d) G D' iff 



f{v{d)) = Tk. Note that T)' contains no facts involving E 
or A. It is readily checked that the resulting structure is a 
model of O, as required. □ 

Lemma 14 It is NExpTlME-Ziarc/ to decide datalog- 
rewritability of queries in (ACC,AQ) and of queries in 
(ACCBAQ). 

Proof. The proof is based on a modification of the proof 



of Lemma 13 For the reduction, let n > and (T, H, V) 
be an instance of the 2" x 2" -tiling problem with T = 
{Ti, . . . , Tp}. We construct a schema S, an y^£C-ontology 
O' and a query A{x) such that (T, H, V) has a solution if 
and only if qs,o'.A(x) is datalog-rewritable if and only if 
qs,o',Jx.Aix) is datalog-rewritable. 

We consider the ontology Q, its extension Q*, and the 



schema Sg from the proof of Theorem 14 To define O' 
we take fresh role names S and H and fresh concept names 
Pi,P2, Pz and encode the 3-colorability problem as follows: 



O' 



U {3S.E C E, 3H.A C A} U 



{i; C Pi U P2 U P3} U 

{PiH Pj ^ A\l <i < j <3}U 

{Pi n 3H.P, C ^ I 1 < i < 3} 

and S = Sg U {S, H}. 

Claim. The following conditions are equivalent: 

• (T, H, V) admits no 2" x 2"-tiling; 

• qs,o'.A(x) is not datalog-rewritable; 

• qs,o' .^x.A{x) is not datalog-rewritable. 

Assume that (T, H, V) admits no 2" x 2"-tiling. For any 
connected undirected graph G, we identify some w in G with 
(0, 0) and define a S-instance J) as the union of Dg and the 
facts S{d, d') for all d, d' in G and H{d, d') for every edge 
{d, d'} in G. It is readily checked that 

• (0, 0) e qs,o' .A(x){'^) iff G is not 3-colorable; 

• qs,o' ,'5x.A{x){'^) = 1 iff G is not 3-colorable. 

It follows immediately that neither qs,0' ,a{x) nor 
qs,0',3x.A(x) are datalog-rewritable. 

Conversely, if (T, H, V) admits a 2" x 2" -tiling then one 
can show datalog-rewritability using exactly the same argu- 



ment as in the proof of Lemma 1 3 



We now prove the undecidability results for ACCF. In 
||8 32 1, alternative definitions of query containment and FO- 
rewritability are employed which consider only instances 
that are satisfiable w.r.t. the ontologies involved. We say that 
(SjOij^i) is contained in (S,C'2,<3'2) w.r.t. consistent in- 
stances if q(s.Oi,qi){'^) ^ '7(s,02.92)(®) for S-instance 
S) such that S) is satisfiable w.r.t. Oi. Similarly, a query 
(S, O, q) is FO-rewritable w.rt consistent instances if there 
exists an FO-query q' such that q'{D) = q(s.o,q){'^) for 
all S-instance £> that are satisfiable w.r.t. O. Undecidability 
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of query containment w.r.t. consistent instances and of FO- 
rewritability w.r.t. consistent instances were proven respec- 



{0 



tively in |8| and \ 32\. Here we show how the proofs can be 
modified to work for query containment, FO-rewritabihty, 
and datalog rewritability as defined in this paper. 

Theorem 22 Query containment, FO-rewritability, and 
datalog-rewritability are all undecidable for queries in 
(ACCJ-,AQ) and queries in (ACCJ-,BAQ). 

Proof. The proof is by reduction of the following finite 
rectangle tiling problem. An instance of the finite rectangle 
tiling problem is given by a triple *p — {1, H, V) with 

• T = {Ti, . . . ,Tp} a non-empty, finite set of tile types 
including an initial tile Tinit to be placed on the lower 
left comer, a final tile Tf\na\ to be placed on the upper 
right corner, and sets il C T and C T of tile types 
to be placed on the upper and right borders respectively, 
satisfying il n 5K = {Tfinai}; 

C T X T a horizontal matching relation; and 
1^ C T X T a vertical matching relation. 

A tiling for (X, H, V) is a map / : {0, . . . , n} x 
, . . . , m} — > 1 such that n, m > 0, 

/(0,0) -Ti.it, 

f{n,m) = Tfinal, 

f{n, j) e 5K for all < j < m; 

f{j. i) ^ for all j < n and < z < m; 

/(i, m) G it for all < i < n; 

f{i,j) ^ il for all < i < n and 1 < j < m. 

j), /(* + 1, j)) e H for all < z < n, and 

+ 1)) e w for alio < i < m. 

Thus, we can assume that H, V, il, and are such that: 

if (T, , Tj ) e H, then e H if and only if Tj e il; 

if Ti e il, then there exists no Tj with {Ti, Tj) e V or 
iTj,T,)eV; 

if (T„ Tj) e V, then T, e 9^ if and only if Tj e D\; 
if T, e <n, then there exists no T, with (T„ Tj) e H or 

iT„T,)eH. 

It is undecidable whether an instance *p of the finite rectan- 
gle tiling problem has a tiling. 

Fix a particular *p — {1,H,V). For the data schema, 
we use S = {Ti, . . . ,Tp,x,y,x^ ,y^}, where Ti,...,Tp 
are treated as concept names, and x, y, x~, and y^ are 
role names. We use x and y to specify horizontal and 
vertical adjacency of points in the rectangle, and the role 
names x^ and y^ to simulate the inverses of x and y (note 
that since x^ and y^ are regular role names, they need 
not be interpreted as the inverses of x and y). We con- 
struct an ^£CJ^-ontology Oqj which asserts functionality 
ofx,y,x^, y^ and contains inclusions using additional con- 
cept names [/, i?, F, 4, , C, ^c,2, Z^^i, Z^^2, 
The concept names U and R are used to mark the upper and 



right border of the rectangle, Y is used to mark points in 
the rectangle, and the remaining concept names are used for 
technical purposes explained below. In the following, for 
e S {c, X, y}, we let Be range over all Boolean combina- 
tions of the concept names i and 2, i e., over all con- 
cepts Li n L2 where Li is a literal over Z^^i, for i € {1, 2}. 
The ontology Csp contains the following concept inclusions, 
where {T,,Tj) e H and {T,, Ti) eV: 

Tfinal c YnunR 
3x.{unYnTJ)nI^nT^ c unY 

By.iRHY nTi,)n lyHT^ C RnY 
3x.{TjnYn3y.Y) 

n3y.{TinYn3x.Y) 

ni^ niyFiCnT, c Y 

3x.3y.Bcn3y.3x.Bc C C 

Bx^3x.3x~ .Ba: C 

Byn3y.3y~.By C ly 

T, E V2/.± 

Tj C Vx._L 

U C Vx.C/ 

i? C yy.R 

U T^nTt C _L 

l<s<t<p 

where T!, e il and Tj G D\. 

The first four inclusions propagate the concept Y down- 
wards and leftwards starting from a point marked with the 
final tile Tfjnai. Note that these inclusions enforce the hor- 
izontal and vertical matching conditions. The concept in- 
clusion with right-hand side C serves to enforce confluence, 
i.e., C is entailed at a constant a if there is a constant b that 
is both an x-y-successor and a y-a;-successor of a. This is so 
because, intuitively. Be is universally quantified: if conflu- 
ence fails, then we can interpret Zd and Z,, 2 so that neither 
of the two conjuncts on the left-hand side of the inclusion for 
C is satisfied. In a similar manner, the inclusion for (resp. 
ly) is used to ensure that x~ (resp. y^) act as the inverse of 
X (resp. y) at all points in the rectangle. 

The following property can be obtained by a minor modi- 
fication of Lemma 30 in pTf : 

Lemma 15 *p admits a tiling if and only if there is a S- 
instance S) which is consistent with Oqj and such that 

9s,0.j,,7i„it(a;)Ay(2:)(S) 7^ 0- 

Let </7rp be the first-order translation of the conjunction 
of all C Vy.-L, T, e il, Tj C Vx.-L, Tj e IH, and of 
U Ts n Tj C _L. The following is readily checked: 

l<s<t<p 

Claim. For aU S-instances D, (adorn (£>), D) |= V3<p iff D is 
satisfiable w.rt. Oqj. 

We now prove undecidability of query containment. Let 
be a fresh concept name and let 

O2 = u {£; c E}, Oi Oqj u {r n i^^it c e) 

Now one can prove that the following conditions are equiv- 
alent: 
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• ^ admits a tiling; 

• (S, Oi,E{x)) is not contained in (S, 02,E{x)); 

• (S, Oi,3x.E{x)) is not contained in (S, O2, 3x.£'(a;)) 

Assume first that ^ admits a tiling. Then by Lemma [15] 
there is a S-instance J) which is consistent with Orp and 
such that gs,o.p,7i„it(a;)AY(a;) (25) 7^ 0- It foUows immedi- 
ately that gs,Oi,i5(:r) (S) 7^ and (7s,Oi,32;.i5(i^)(®) = 1- 
On the other hand, since D is consistent with O2, and E ap- 
pears only trivially in O2, we have qs,02,E(x){'S) — ™d 

9s, 02,32;. £(2:) (®) = 0- 

Next suppose that ^ does not admit a tiling, and let D be 

an S-instance which is consistent with Oi. By Lemma [TS] 

qs,o^,T,„.,{x)AY{x){'D) = 0, and hence gs,Oi,32;.£;(a;)(S) = 
0. The desired containments trivially follow. 

To prove undecidability of FO-rewritability, we expand 
Oi to a new ontology O3. To define O3 we take a fresh role 
name S and two concept names A and F and set 

O3 d u {35.^; nE,EnFnA} 

and S3 = S U {S*, i^}. 

Claim. The following conditions are equivalent: 

• *p admits a tiling; 

• 983,03.^1(2;) is not FO-rewritable; 

• Qs3,Oa,3x.A{x) is not FO-rewritable. 



Assume first that *p admits a tiling. By Lemma 15 



we 



can find an S-instance Dtp which is consistent with Otp 
and b e adom(S)<p) such that b G gs,o,j,,T;„i,(x)AY(2;)(S'p). 
and hence e '7s,Oi.£;(2;)(®q3)- We can use essentially the 
same argument as in Lemma 13 to show that qs,Oi,E(x) ™d 
qs.Oi.E{x) ^6 not FO-rewritaEIe. Specifically, we construct 
S-instances S)„j by taking the union of Sqj and the facts 

F{ao), S{ao,ai), S'(a™, b). 

It is readily checked that 

• flo e gs3,03,A(a;)(®m) for all m > 0; 

• oo ^ 9s3,03,A(2;)(®m), whcrc 2>;„ results from D„j by 
removing some fact (a^, ak+i) from 

It follows that no finite obstruction set exists, and hence that 
qs,Oi,A{x) is not FO-rewritable. We can proceed similarly 

for qs,Oi3x.A{x)- 

Assume now that does not admit a tiling. Then for 
every S-instance S), D is satisfiable w.r.t. dp if and only if 

qs,03,^x.A{x}{'^) = 0. Thus, the query defined by ^(^sp is 
equivalent to qs,03,Bx.A(x)^ ^nd the query defined by {x — 
x) A ^V3q5 is equivalent to qs,03,A{x)- 

To prove undecidability of datalog-rewritability, we ex- 
pand Oi to a new ontology O4. To define O4, we take fresh 
role names 5* and H and fresh concept names Pi,P2, P3 and 



encode the 3-colorability problem as follows: 

O4 = ^1 U {3S.E C E, 3H.A C A} U 
{i; C Pi U P2 U Pa} U 
{Pi n P,- C A I 1 < i < j < 3} U 
{Pi n 3H.P, C A I 1 < i < 3} 

We use the schema S4 = S U {S, H}. 

Claim. The following conditions are equivalent: 

• ^ admits a tiling; 

• qSi,Oi,A{x) is not datalog-rewritable; 

• qSi,Oi3x.A(x) is not datalog-rewritable. 

First suppose that ^ admits a tiling. We have seen previously 
that this implies the existence of an S-instance S.p which 
is consistent with Oqj and contains b e adom(S(p) such 
that b G qs,Oi,E{x){'^<:p)- We proceed similarly to Lemma 
14 Given a connected undirected graph G, we define an 
S-instance D as the union of Stp and the facts S{d, d!) for 
all d, d! in G and H{d, d') for every edge {d, d'} in G. It is 
readily checked that 

' b E qs4,04,,A{x) iff G is not 3-colorable; 

• qSi,Oi,^x.A{x){'^) = 1 iff G is not 3-colorable. 

It follows directly that neither qs,o',A{x) nor qs^o' ,^x.A{x) 
are datalog-rewritable. 

Next suppose that *p does not admit a tiling. Then for 
every S-instance 2), we have that D is satisfiable w.r.t. Oqj if 
and only if qs.04,,Bx.A{x) (23) = 0. We can then simply reuse 
the FO-rewritings ^(/Jqj and {x — x) A ^(ySqj from above, 
since these can be equivalently expressed as datalog queries. 
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